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Preface 


The third edition of this book continues the objective of providing coverage of actuarial 
mathematics in a flexible manner that meets the needs of several audiences. These range 
from those who want only a basic knowledge of the subject, to those preparing for careers 
as professional actuaries. All this is carried out with a streamlined system of notation, and a 
modern approach to computation involving spreadsheets. 

The text is divided into four parts. The first two cover the subject of life contingencies. 
The modern approach towards this subject is through a stochastic model, as opposed to 
the older deterministic viewpoint. I certainly agree that mastering the stochastic model 
is the desirable goal. However, my classroom experience has convinced me that this is not 
the right place to begin the instruction. I find that students are much better able to learn 
the new ideas, the new notation, the new ways of thinking involved in this subject, when 
done first in the simplest possible setting, namely a deterministic discrete model. After the 
main ideas are presented in this fashion, continuous models are introduced. In Part II of the 
book, the full stochastic model of life contingencies can be dealt with in a reasonably quick 
fashion. 

Another innovation in Part II is to depart from the conventional treatment of life contin- 
gencies as dealing essentially with patterns of mortality or disability in a group of human 
lives. Throughout Part IL, we deal with general failure times which makes the theory more 
widely adaptable. 

Part III deals with more advanced stochastic models. Following an introduction to stochas- 
tic processes, there is a chapter covering multi-state theory, an approach which unifies many 
of the ideas in Parts I and II. The final chapter in Part III is an introduction to modern financial 
mathematics. 

Part IV deals with the subject of risk theory, sometime referred to as loss models. It 
includes an extensive coverage of classical ruin theory, a topic that originated in actuarial 
science but recently has found many applications in financial economics. It also includes 
credibility theory, which will appeal to the reader interested more in the casualty side of 
actuarial mathematics. 

This book will meet the needs of those preparing for the examinations of many of the 
major professional actuarial organizations. Parts I to III of this new third edition covers all 
of the material on the current syllabuses of Exam MLC of the Society of Actuaries and 
Canadian Institute of Actuaries and Exam LC of the Casualty Actuarial Society, and covers 
most of the topics on the current syllabus of Exam CT5 of the British Institute of Actuaries. 
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In addition, Part IV of the book covers a great deal of the material on Exam C of the Society 
of Actuaries and Canadian Institute of Actuaries, including the topics of Frequency, Severity 
and Aggregate Models, Risk Measures, and Credibility Theory. 

The mathematical prerequisites for Part | are relatively modest. comprising elementary 
linear algebra and probability theory, and, beginning in Chapter 8, some basic calculus. A 
more advanced knowledge of probability theory is needed from Chapter 13 onward, and 
this material summarized in Appendix A. A usual prerequisite for actuarial mathematics is a 
course in the theory of interest. Although this may be useful, it is not strictly required. All 
the interest theory that is needed is presented as a particular case of the general deterministic 
actuarial model in Chapter 2. 

A major source of difficulty for many students in learning actuarial mathematics is to 
master the rather complex system of actuarial notation. We have introduced some notational 
innovations, which tie in well with modern calculation procedures as well as allow us to 
greatly simplify the notation that is required. We have, however, included all the standard 
notation in separate sections, at the end of the relevant chapters, which can be read by those 
readers who desire this material. 

Keeping in mind the nature of the book and its intended audience, we have avoided 
excessive mathematical rigour. Nonetheless, careful proofs are given in all cases where these 
are thought to be accessible to the typical senior undergraduate mathematics student. For 
the few proofs not given in their entirety, mainly those involving continuous-time stochastic 
processes, we have tried at least to provide some motivation and intuitive reasoning for the 
results. 

Exercises appear at the end of each chapter. In Parts I and II these are divided up into 
different types. Type A exercises generally are those which involve direct calculation from the 
formulas in the book. Type B involve problems where more thought is involved. Derivations 
and problems which involve symbols rather than numeric calculation are normally included 
in Type B problems. A third type is spreadsheet exercises which themselves are divided 
into two subtypes. The first of these asks the reader to solve problems using a spreadsheet. 
Detailed descriptions of applicable Microsoft Excel® spreadsheets are given at the end of 
the relevant chapters. Readers of course are free to modify these or construct their own. The 
second subtype does not ask specific questions but instead asks the reader to modify the given 
spreadsheets to handle additional tasks. Answers to most of the calculation-type exercises 
appear at the end of the book. 

Sections marked with an asterisk * deal with more advanced material, or with special 
topics that are not used elsewhere in the book. They can be omitted on first reading. The 
exercises dealing with such sections are likewise marked with *, as are a few other exercises 
which are of above average difficulty. 

There are various ways of using the text for university courses geared to third or fourth 
year undergraduates, or beginning graduate students. Chapters 1 to 8 could form the basis 
of a one-semester introductory course. Part IV is for the most part independent of the first 
three parts, except for the background material on stochastic processes given in Chapter 18 
and would constitute another one-semester course. The rest of the book constitutes roughly 
another two semesters worth of material, with possibly some omissions; Chapter 13 is not 
needed for the rest of the book. Chapters 7 (except for Section 7.3.1), 9 and 12 deal with 
topics that are important in applications, but which are used minimally in other parts of the 
text. They could be omitted without loss of continuity. 
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CHANGES IN THE THIRD EDITION 


There are several additions and changes to the third edition. 

The most notable is a new Chapter 20 providing an introduction to the mathematics of 
financial markets. It has been long recognized that knowledge of this subject is essential to 
the management of financial risk that faces the actuary of today. 

Other additions include the following: 


Chapter 12, on expenses, has been considerably enlarged to include the topic of profit 
testing. 


The chapter on multi-state models has been expanded to include discussion of reserves 
and profit testing in such models, as well as several additional techniques for continuous- 
time problems. 


Some extra numerical procedures have been included, such as Euler's method for 
differential equations, and the three-term Woolhouse formulas for fractional annuity 
approximations. 


An introduction to Brownian motion has been added to the material on continuous-time 
stochastic processes. 


The previous material on universal life and variable annuities has been rewritten and 
included in a new chapter dealing with miscellaneous topics. A brief discussion of 
pension plans is included here as well. 


Additional examples, exercises, and clarification have been added to various chapters. 


As well as the changes there has been a reorganization in the material The previous two 
chapters on stochastic processes have been combined into one and now appear earlier in the 
book as background for the multi-state and financial markets chapters. In the current Part IV, 
the detailed descriptions of the various distributions have been removed and added as a section 
to the Appendix on probability theory. 
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Part I 


THE DETERMINISTIC LIFE 
CONTINGENCIES MODEL 


Introduction and motivation 


1.1 Risk and insurance 


In this book we deal with certain mathematical models. This opening chapter, however, is a 
nontechnical introduction, designed to provide background and motivation. In particular, we 
are concerned with models used by actuaries, so we might first try to describe exactly what it 
is that actuaries do. This can be difficult, because a typical actuary is concerned with many 
issues, but we can identify two major themes dealt with by this profession. 

The first is risk, a word that itself can be defined in different ways. A commonly accepted 
definition in our context is that risk is the possibility that something bad happens. Of course, 
many bad things can happen, but in particular we are interested in occurrences that result 
in financial loss. A person dies, depriving family of earned income or business partners of 
expertise. Someone becomes ill, necessitating large medical expenses. A home is destroyed 
by fire or an automobile is damaged in an accident. No matter what precautions you take, 
you cannot rid yourself completely of the possibility of such unfortunate events, but what you 
can do is take steps to mitigate the financial loss involved. One of the most commonly used 
measures is to purchase insurance. 

Insurance involves a sharing or pooling of risks among a large group of people. The origins 
go back many years and can be traced to members of a community helping out others who 
suffered loss in some form or other. For example, people would help out neighbours who had 
suffered a death or illness in the family. While such aid was in many cases no doubt due to 
altruistic feelings, there was also a motivation of self-interest. You should be prepared to help 
out a neighbour who suffered some calamity, since you or your family could similarly be 
aided by others when you required such assistance. This eventually became more formalized, 
giving rise to the insurance companies we know today. 

With the institution of insurance companies, sharing is no longer confined to the scope of 
neighbours or community members one knows, but it could be among all those who chose 
to purchase insurance from a particular company. Although there are many different types 
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of insurance, the basic principle is similar. A company known as the insurer agrees to pay 
out money, which we will refer to as benefits, at specified times, upon the occurrence of 
specified events causing financial loss. In return, the person purchasing insurance, known as 
the insured, agrees to make payments of prescribed amounts to the company. These payments 
are typically known as premiums. The contract between the insurer and the insured is often 
referred to as the insurance policy. 

The risk is thereby transferred from the individuals facing the loss to the insurer. The 
insurer in turn reduces its risk by insuring a sufficiently large number of individuals, so that 
the losses can be accurately predicted. Consider the following example, which is admittedly 
vastly oversimplified but designed to illustrate the basic idea. 

Suppose that a certain type of event is unlikely to occur but if so, causes a financial loss 
of 100 000. The insurer estimates that about 1 out of every 100 individuals who face the 
possibility of such loss will actually experience it. If it insures 1000 people, it can then expect 
10 losses. Based on this model, the insurer would charge each person a premium of 1000. 
(We are ignoring certain factors such as expenses and profits.) It would collect a total of 
1 000 000 and have precisely enough to cover the 100 000 loss for each of the 10 individuals 
who experience this. Each individual has eliminated his or her risk, and in so far as the estimate 
of 10 losses is correct, the insurer has likewise eliminated its own risk. (We comment further 
on this statement in the next section.) 

We conclude this section with a few words on the connection between insurance and 
gambling. Many people believe that insurance is really a form of the latter, but in fact it is 
exactly the opposite. Gambling trades certainty for uncertainty. The amount of money you 
have in your pocket is there with certainty if you do not gamble, but it is subject to uncertainty 
if you decide to place a bet. On the other hand, insurance trades uncertainty for certainty. The 
uncertain drain on your wealth, due to the possibility of a financial loss, is converted to the 
certainty of the much smaller drain of the premium payments if you insure against the loss. 


1.2 Deterministic versus stochastic models 


The example in Section 1.1 illustrates what is known as a deterministic model. The insurer 
in effect pretends it will know exactly how much it will pay out in benefits and then charges 
premiums to match this amount. Of course, the insurer knows that it cannot really predict 
these amounts precisely. By selling a large number of policies they hope to benefit from 
the diversification effect. They are really relying on the statistical concept known as the 
‘law of large numbers’, which in this context intuitively says that if a sufficiently large 
number of individuals are insured, then the total number of losses will likely be close to the 
predicted figure. 

To look at this idea in more detail, it may help to give an analogy with flipping coins. If we 
flip 100 fair coins, we cannot predict exactly the number of them that will come up heads, but 
we expect that most of the time this number should be close to 50. But ‘most of the time’ does 
not mean always. It is possible for example, that we may get only 37 heads, or as many as 63, 
or even more extreme outcomes. In the example given in the last section, the number of losses 
may well turn out to be more than the expected number of 10. We would like to know just how 
unlikely these rare events are. In other words, we would like to quantify more precisely just 
what the words ‘most of the time’ mean. To achieve this greater sophistication a stochastic 
model for insurance claims is needed, which will assign probabilities to the occurrence of 
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various numbers of losses. This will allow adjustment of premiums in order to allow for the 
risk that the actual number of losses will deviate from that expected. We will however begin 
the study of actuarial mathematics by first developing a deterministic approach, as this seems 
to be the best way of learning the basic concepts. After mastering this, it is not difficult to turn 
to the more realistic stochastic setting. 

We will not get into all the complications that can arise. In actual coin flipping it seems 
clear that the results of each toss are independent of the others. The fact that one coin comes 
up heads, is not going to affect the outcomes of the others. It is this independence which is 
behind the law of large numbers, and which results in outcomes that are usually close to what 
is expected. There are some risks, often referred to as systematic or non-diversifiable, where 
the independence assumption fails, and which can adversely affect all or a large number of 
members of a group at the same time. For example, a spreading epidemic could cause life or 
health insurers to pay more in claims than they expected. Selling more policies in order to 
diversify would not help their financial situation. It could in fact make it worse, if the premiums 
were not sufficient to cover the extra losses. Severe climatic disturbances causing storms could 
impact property insurance in the same way. In 2008, falling real estate prices in the United 
States affected mortgage lenders and those who insured mortgage lenders against bad debts, 
to the extent that this helped trigger a global financial crisis. A detailed discussion of these 
matters is not within the scope of this work, and for the most part, the stochastic model we 
present will confine attention to the usual insurance model where the risks are considered as 
independent. It should be kept in mind however that the detection and avoidance of systematic 
risk are matters that the actuary must always be aware of. 


1.3 Finance and investments 


The second theme involved in an actuary’s work is finance and investments. In most of the 
types of insurance that we focus on in this book, an additional complicating factor is the 
long-term nature of the contracts. Benefits may not be paid until several years after premiums 
are collected. This is certainly true in life insurance, where the loss is occasioned by the death 
of an individual. Premiums received are invested and the resulting earnings can be used to help 
provide the benefits. Consider the simple example given above, and suppose further that the 
benefits do not have to be paid until 1 year after the premiums are collected. If the insurer can 
invest the money at, say, 5% interest for the year, then it does not need to charge the full 1000 
in premium, but can collect only 1000/1.05 from each person. When invested, this amount 
will provide the necessary 1000 to cover the losses. Again, this example is oversimplified and 
there are many more complications. We will, in the next chapter, consider a mathematical 
model that deals with the consequences of the payments of money at various times. A much 
more elaborate treatment of financial matters, incorporating randomness, is presented in 
Chapter 20. 


1.4 Adequacy and equity 


We can now give a general description of the responsibilities of an actuary. The overriding 
task is to ensure that the premiums, together with investment earnings, are adequate to provide 
for the payment of the benefits. If this is not true, then it will not be possible for the insurer to 
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meet its obligations and some of the insureds will necessarily not receive compensation for 
their losses. The challenge in meeting this goal arises from the several areas of uncertainty. 
The amount and timing of the benefits that will have to be paid, as well as the investment 
earnings, are unknown and subject to random fluctuations. The actuary makes substantial use 
of probabilistic methods to handle this uncertainty. 

Another goal is to achieve equity in setting premiums. If an insurer is to attract purchasers, 
it must charge rates that are perceived as being fair. Here also, the randomness means that it is 
not obvious how to define equity in this context. It cannot mean that two individuals who are 
charged the same amount in premiums will receive exactly the same back in benefits, for that 
would negate the sharing arrangement inherent in the insurance idea. While there are different 
possible viewpoints, equity in insurance is generally expected to mean that the mathematical 
expectation of these two individuals should be the same. 


1.5 Reassessment 


Actuaries design insurance contracts and must initially calculate premiums that will fulfill the 
goals of adequacy and equity, but this is not the end of the story. No matter how carefully 
one makes an initial assessment of risks, there are too many variables to be able to achieve 
complete accuracy. Such assessments must be continually re-evaluated, and herein lies the 
real expertise of the actuary. This work may be compared to sailing a ship in a stormy sea. 
It is impossible to avoid being blown off course occasionally. The skill is to detect when 
this occurs and to take the necessary steps to continue in the right direction. This continual 
monitoring and reassessing is an important part of the actuary’s work. A large part of this 
involves calculating quantities known as reserves. We introduce this concept in Chapter 2 and 
then develop it more fully in Chapter 6. 


1.6 Conclusion 


We can now summarize the material found in the subsequent chapters of the book. We will 
describe the mathematical models used by the actuary to ensure that an insurer will be able 
to meet its promised benefits payments and that the respective purchasers of its contracts are 
treated equitably. In Part I, we deal with a strictly deterministic model. This enables us to 
focus on the main principles while keeping the required mathematics reasonably simple. In 
Part II, we look at the stochastic model for an individual insurance contract. In Part III, we 
look at more advanced stochastic models and introduce the mathematics of financial markets. 
In Part IV, we consider models that encompass an entire portfolio of insurance contracts. 


The basic deterministic model 


2.1 Cash flows 


As indicated in the previous chapter, a basic application of actuarial mathematics is to model 
the transfer of money. Insurance companies, banks and other financial institutions engage in 
transactions that involve accepting sums of money at certain times, and paying out sums of 
money at other times. 

To construct a model for describing this situation, we will first fix a time unit. This can 
be arbitrary, but in most applications it will be taken as some familiar interval of time. For 
convenience we will assume that time is measured in years, unless we indicate otherwise. We 
will let time O refer to the present time, and time ¢ will then denote ¢ time units in the future. 
We also select an arbitrary unit of capital. In this chapter, we assume that all funds are paid out 
or received at integer time points, that is, at time 0, 1,2, .... The amount of money received 
or paid out at time k will be called the net cash flow at time k and denoted by c;. A positive 
value of c, denotes that a sum is to be received, whereas a negative value indicates that a sum 
is paid out. The entire transaction is then described by listing the sequence of cash flows. We 
will refer to this as a cash flow vector, 


€ = (69.64. .... CN). 


where N is the final duration for which a payment is made. 

For example, suppose I lend you 10 units of capital now and a further 5 units a year from 
now. You repay the loan by making three yearly payments of 7 units each, beginning 3 years 
from now. The resulting cash flow vector from my point of view is 


= (-10, 5,0, 7, 7, 7). 


From your point of view, the transaction is represented by —c — (10,5,0, —7, —7, —7). 
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One of our main goals in this chapter is to provide methods for analyzing transactions in 
terms of their cash flow vectors. There are several basic questions that could be asked: 


* When is a transaction worthwhile undertaking? 

* How much should one pay in order to receive a certain sequence of cash flows? 

e How much should one charge in order to provide a certain sequence of cash flows? 
* How does one compare two transactions to decide which one is preferable? 


All of these questions are related, and we could answer all of them if we could find a 
method to put a value on a sequence of future cash flows. If all cash flows were paid at the 
same time, or if the value of money did not depend on the time that a payment was made, 
the problem would reduce to one of simple addition. We could simply value a cash flow 
sequence by adding up all the payments. We cannot proceed in this naive way, however and 
must consider the time value of money. It is a basic economic fact that we prefer present to 
future consumption. We want to eat the chocolate bar now, rather than tomorrow. We want 
to enjoy the new car today, rather than next month. This means of course that money paid to 
us today is worth more than money paid in the future. We are no doubt all very familiar with 
this fact. We pay interest for the privilege of borrowing money today, which lets us consume 
now, or we advance money to others, giving up our present consumption and expecting to 
be compensated with interest earnings. In addition, there is the effect of risk. If we are given 
a unit of money today, we have it. If we forego it now in return for future payments, there 
could be a chance that the party who is supposed to make remittance to us may be unable 
or unwilling to do, and we expect to be compensated for the possible loss. A major step in 
answering the above questions is to quantify this dependence of value on time. 

Readers who have taken courses on the theory of compound interest will be familiar 
with many of the ideas. However, our treatment will be somewhat different than that usually 
given. One reason for this is that we want to develop the concepts in such a way that they 
are applicable to more general situations, as given in Chapters 3—5. A second reason is that 
our approach is designed to be compatible with modern-day computing methods such as 
spreadsheets. 

To conclude this section, we remark that many complications arise when the cash flows 
are not exactly known in advance. They may depend on several factors, including random 
elements. There may be complicated interrelationships between the various cash flows. These 
matters involve advanced topics in finance and actuarial mathematics and for the most part 
will not be dealt with in this book. In Part I we deal mainly with a simplified model, where all 
cash flows are fixed and known in advance. In later parts of the book we will consider certain 
aspects of randomness, but will not get into the full extent of complications that can arise. 


2.2 An analogy with currencies 


To motivate the basic ideas, we will consider first a completely different problem, which is 
nonetheless related to that introduced above. Suppose that I give you 300 Canadian dollars, 
200 US dollars and 100 Australian dollars. How much money did I give you? It would be 
naive indeed to claim that you received 600 dollars, for clearly the currencies are of different 
value. To answer the question we will need conversion factors that allow us to deduce the 
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value of each type of dollar in terms of others. Let v(c, u) denote the value in Canadian dollars 
of one US dollar. Assume that v(c, u) = 1.05, which means that a US dollar is worth 1.05 
Canadian dollars. (Our numbers here are for purposes of illustration only. They are close to the 
conversion rates at the time of writing, but they may well have changed considerably by the 
time you are reading this.) Similarly, letting a stand for Australian, we will assume that v(c, a) 
equals 0.95, which means 95 cents Canadian will buy one Australian dollar. The convention 
we are using here, which should be noted for later use, is that the v function returns the value 
of one unit of the second coordinate currency in terms of the first coordinate currency. 

There are four more conversion factors of interest, but an important fact is that they can all 
be deduced from just these two (or indeed from any two that have a common first or common 
second coordinate). We note first that if it takes 1.05 Canadian dollars to buy 1 US dollar, 
then a single Canadian dollar is worth 1/1.05 — 0.9524 US dollars. That is, 


v(u, c) = v(c,u)! = 0.9524, v(a,c) = wc, a)! = 1.0526, 


where we use similar reasoning for the Australian dollar. 

Next consider v(u, a). We want the amount of US dollars needed to buy one Australian 
dollar. We could conceivably effect this purchase in two stages, first using US money to buy 
Canadian, and then using Canadian to buy Australian. Working backwards, it will take 0.95 
Canadian to buy 1 Australian, and it will take v(u, c) 0.95 US dollars to buy the 0.95 Canadian. 
To summarize, 


v(u, a) = v(u, c)v(c, a) = 0.9048. 
Our calculations are completed with 
v(a,u) = v(u, a)! = 1.1052. 


The reader may notice, given a typical real-life listing of currency prices, that the rela- 
tionships we state here do not hold exactly, but that is due to commissions and other charges. 
In the absence of these, they must necessarily hold. 

Let us now return to the original problem of determining of how much I paid you. We 
must first select a currency to express the answer in. For example, we could say that the total 
was equivalent to 300 + 200v(c, u) + 100v(c, a) = 605 Canadian dollars. We could also say 
that the total was equivalent to 300v(u, c) + 200 + 100(u, a) = 576.20 US dollars. Notice as a 
shortcut, that we did not need to compute the latter sum (which could be a significant saving in 
calculation if we had several rather than just three currencies). If the total amount is equivalent 
to 605 Canadian, then it must also be equivalent to 605v(u, c) = 576.20 US dollars. Similarly, 
the total in Australian dollars can be computed as 605v(a, c) or alternatively as 576.20w(a, u), 
both of which are equal to 637 (approximately as there are some rounding differences). 


2.3 Discount functions 
We now go back to the original situation. We want to value a sequence of cash flows, which are 


all in the same currency, but which are paid at different times. Conversion factors are needed 
to convert the value of money paid at one time to that paid at another. The principles involved 


10 THE BASIC DETERMINISTIC MODEL 


are exactly the same as in Section 2.2. Let v(s, f) denote the value at time s, of 1 unit paid 
at time f. (Note again that our convention is that the 1 unit goes with the second coordinate. 
In other words, 1 unit paid at time f is equivalent to v(s, t) paid at time s.) In the case where 
s < t, you can interpret v(s, f) as the amount you must invest at time s in order to accumulate 
1 at time f. In the case where s > f you can interpret v(s, t) as the amount that you will have 
accumulated at time s from an investment of | at time f. The fundamental relationship that 
must be satisfied is the same as we noted with currencies, namely 


v(s, t)v(t, u) = v(s,u), for alls, t, u. (2.1) 


Due to its importance, we will repeat the reasoning for this fundamental fact in the current 
context. It is simply that 1 unit at time u is equivalent to v(t, u) at time f, and this v(t, u) at time 
t is equivalent to v(s, t)v(t, u) at time s, showing that 1 unit at time u is indeed equivalent to 
V(s, t)v(t, u) at time s. 

We now make a formal definition. 


Definition 2.1 A discount function is a positive valued function v, of two nonnegative 
variables, satisfying (2.1) for all values of s, f, u. 


Other desirable features follow immediately from (2.1). Taking s = t = u, we deduce that 
V(s, s)v(s, s) = v(s, s) and, since v(s, s) is nonzero, we verify the obvious relationship 


v(s,s) 2 1, for alls. (2.2) 


From this we deduce that v(s, t)v(t, s) = v(s, s) = 1 so that we recover the relationship, noted 
in the currency case, that 


v(s, f) = v(t, s). (2.3) 


Although we have called v a ‘discount’ function, the common English usage of the word 
really applies to the case where s < t. In that case, v(s, f) will be normally less than 1, and 
the function is returning the discounted amount of 1 unit paid at a later date. Some authors 
would prefer to define the discount function to apply only to this case and then define another 
function, called an accumulation function, to cover the case where s is greater than r. In that 
case, the function returns the accumulated amount from an investment of 1 unit at an earlier 
date, which will normally be greater than or equal to 1. We find it more convenient to use only 
the one function as given above, since the ideas, and the key relationship (2.1), are the same 
regardless of the ordering on s, t and u. 

The concept of a discount function will be a key ingredient in what follows. We will 
suppose that, given any financial transaction, there is a suitable discount function that governs 
the investment of all funds. We will deal only briefly with the important problem of choosing 
the discount function. There are many factors governing this choice and it will depend on the 
nature of the transaction. It may be chosen to simply reflect the preferences of the parties for 
present as opposed to future consumption. It may reflect the desired return that an investor 
wishes to achieve. In many cases it is based on a prediction of market conditions that will 
determine what returns can be expected on invested capital. 
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A possible complication that we will not deal with to any great extent is that which arises 
when two parties to a transaction have different choices of a suitable discount function. We 
will assume, unless indicated otherwise, that the same function applies to both parties. 


2.4 Calculating the discount function 


We now provide a procedure for calculating the values of v in a systematic way. The currency 
example indicated that we can calculate all the values of a discount function just by knowing 
those at points with a fixed first coordinate, that is, with a common comparison point. In most 
applications it is convenient to take this as time 0. To simplify notation, we drop the first 
coordinate in this case and define 


v(t) = v(0, t). 
It follows from (2.1) that v(s, t) = v(s, 0)v(0, t) and then from (2.3) that 


v(s, t) = n) (2.4) 


v(s) 


In the first few chapters of this book, we will need to know the value of v(s, f) only 
for integral values of s and t. From (2.4), it will be sufficient to know v(m) where n is any 
nonnegative integer. 

To calculate v(m), note first that the key relationship (2.1) can be extended from one 
involving three terms to an arbitrary number. That is, given times fj, t,...,t,, 


V(t}, ty )V(h, t3) sat V(f, 4. t.) = v(fi " t,)- (2.5) 


To see this, take for example, n = 4. The quantity v(t;,t5)v(t5, t3)v(t4,1,) is equal to 
V(f1, t3)v(t4, t4) by applying (2.1) to the first two terms. By another application of (2.1) it 
is equal to v(t,, t4). We have extended (2.1) to a formula involving four terms. A similar step 
extends from four to five, and so on. Formally, we are using mathematical induction. 

It follows from (2.5) that 


v(n) = v0, 1)»v(1,2) --- vin — 1, n), (2.6) 


so we need only know v(n — 1,7) for all positive integers n. Given such values, we can use 
the recursion formula 


v(n) = v(n — 1)v(n — 1,n), w(0) = 1, (2.7) 
to calculate all values of v(m). The information we need is then summarized by the vector 
v = (v0), v(1), vQ), ... .v(N)). 


where N is the final duration at which a nonzero cash flow occurs. 
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2.5 Interest and discount rates 


In practice, rather than specifying v(k — 1,k) directly, it is more common to deduce this 
quantity from the corresponding rates of interest or discount. Given any discount function v 
and a nonnegative integer k, these are defined as follows. 


Definition 2.2 The rate of interest for the time interval k to k + 1 is the quantity 

iy — vk+1,k)- 1. 
Definition 2.3 The rate of discount for the time interval k to k + 1 is the quantity 

d, = 1 — wk,k + 1). 

Note that an investment of 1 unit at time k will produce v(k + 1, k) = 1+ i, units at time 

k + 1. Similarly, an investment of 1 — d, = v(k, k + 1) units at time k will accumulate to 1 unit 
at time k+ 1. So d, is the amount you must take off from each unit paid at the end of the 
period, to get the equivalent amount at the beginning of the period. 


Given any of the three quantities v(K, k + 1), i, or d, we can easily obtain the other two. 
For example, using the definitions and (2.3), it is straightforward to deduce that 


i 
Dcrd 


d 
p= div + LE) = 7 S (2.8) 


d, =iv(k,k + 1) = 
Remark In most cases we expect i, will be a nonnegative number, although from the 
definition it could in theory take on any value greater than —1. Indeed, people who make 
investments yielding less than the rate of inflation are in effect receiving a negative interest 
rate. We will not be concerned with inflation in this text however, and the reader can generally 
assume that interest and discount rates are nonnegative, unless otherwise specified. 


Remark The reader is cautioned that some other authors use a different convention for 
the subscript k in interest and discount rates. They would refer to our ig as i,, since it is the 
interest rate for the first time interval, and in general would use iz}; for our i,. We find it more 
convenient to start all indexing at 0. 


2.6 Constant interest 


Readers who have previously studied compound interest will be familiar with one particular 
family of discount functions. Suppose we believe that accumulation of invested funds depends 
only on the length of time for which the capital is invested, and not on the particular starting 
time. That is, we postulate that for all nonnegative s, t, h, 


Ws, s +h) = v(t,t+h). (2.9) 
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If this holds, then 
v(s + t) = v(0, s + £) = v(0, s)v(s, s + £) = v(0, v(0, t) = v(s)v(t). 


We have a familiar functional equation, and it is well known that if we assume some regularity 
condition, for example, that v be continuous, we must have v(t) = v and therefore that 


v(s, f) = v^ 


for some constant v. 

For such a discount function, the rate of interest i, is a constant i = v7! — 1, and the 
rate of discount d, is a constant d — 1 — v. The discount function is therefore conveniently 
given by simply stating a single parameter, which is usually taken as i, the constant rate of 
interest. So, for example, if we want to know how much we will accumulate at time n from 
an investment of 1 at time 0, this is just v(n, 0) = (1 + i)", which is the usual starting point 
for the subject of compound interest in elementary texts. In the pre-calculator, pre-computer 
age, this constant interest family of discount functions was almost always used, mainly to 
facilitate the computation. Many current textbooks on this subject still deal largely with this 
constant interest case. However, with modern computing methods, such as spreadsheets, the 
extra effort involved in using a general discount function is negligible, and there is no reason 
to restrict the flexibility that one can achieve. Throughout this book we will use arbitrary 
discount functions, although occasionally we will restrict discussion to the constant interest 
case in order to simplify the notation. One of the main advantages of not restricting ourselves 
to constant interest will be apparent in Chapters 4 and 5 where we will be able to incorporate 
the contingencies of life and death into the discount function. A key point in these chapters 
is that the calculation of premiums for life insurance and life annuities can be expressed as a 
special case of interest theory, using a general discount function. 


2.7 Values and actuarial equivalence 


We now put together the two key concepts of cash flow vector and discount function. Suppose 
we are given a cash flow vector € = (co, c4, ... , cy) and a discount function v. We want to 
calculate the single payment at time zero that is equivalent to all the cash flows, assuming 
that the time value of money is modeled by the given discount function v. This amount is 
commonly referred to as the present value of the sequence of cash flows, and sometimes 
abbreviated as P.V. We can think of it as the amount we would pay at time zero in order to 
receive all of the cash flows, or equivalently as the single payment that we would be willing 
to accept now in lieu of all these future cash flows. The cash flow at time k has a present value 
of c, v(k) by definition of the discount function. When c, > 0 this is what we have to pay now 
in order to receive c, at time k. When c, < 0, receiving —c,v(k) now will let us pay out —c, 
at time k. 
We then add up the individual present values to get 


N 


Present value of all cash flows = 2 c, (K). (2.10) 
k=0 
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Example 2.1 To make sure this is understood, take a very simple example. Let v(k) = 27% 
for all k. This is a constant interest rate per period of 100 %. In other words, money doubles 
itself every period. This is of course not very realistic for a period of a year, but it could 
hold for a sufficiently long interval. Suppose we are to receive 12 units at time 2, but will be 
required to pay out 8 units at time 3. Find the present value, and verify that it makes sense. 


Solution. From (2.10) the present value is 12v(2) — 8v(3) = 12(1/4) — 8(1/8) = 2. To verify 
this, we note that the 12 units received at time 2 will accumulate to 24 at time 3. We then 
have to pay out 8, leaving an accumulation of 16 units by time 3. Contrast this with receiving 
instead a single payment of 2 at time 0. This will accumulate to 4 at time 1, 8 at time 2 and at 
16 at time 3. Therefore assuming (as we do throughout) that all money accumulates according 
to the given discount function, we are in exactly the same position in both cases. 


Remark — Unrealistic interest rates will be frequently used in examples and exercises 
throughout the book, in order to simplify the numerical computation, and allow the reader 
to concentrate on the underlying concepts. So for example, we will often take i, = 20%, 
25%, 50%, 100%, which correspond respectively to v(k, k + 1) equal to 5/6, 4/5, 2/3, 1/2. 
Calculation is even easier when we have an interest rate of 0, in which case v(k) = 1 for all k. 


Note that it was convenient in the above example to compare the amounts accumulated at 
the time of the last payment. This is known as the accumulated value and in general is given by 


N 


» cíV(N, k). 


k=0 


It represents the amount we will have at time N, resulting from all the cash flows. 

More generally, we can calculate a value at any time between 0 and N. In this chapter we 
concentrate on integer times. We take the present value of the future cash flows at that time, 
plus the accumulated values of the past cash flows. The following definition formulates this 
precisely. 


Definition 2.4 For any time n = 0, 1, ... , N, the value at time n of the cash flow vector c with 
respect to the discount function v is given by 


N 


Val,(e; v) = È, exv(n, E). 


k=0 


It represents that single amount that we would accept at time n in place of all the other cash 
flows, assuming that everything accumulates according to the discount function v. 


The values at various times are related in a simple way. Since v(m, k) = v(m, n)v(n, k), it 
follows immediately that 


Val,,(€; v) = Val, (c; v)v(m, n). 21D 


(Formulas marked with denote key facts, of particular importance.) 
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Formula (2.11) should be intuitively clear. The single amount we would accept at time 
m in place of all the cash flows must be the value at time m of the single payment that we 
would accept at time n, in place of all the cash flows. We can then easily deduce all values 
from the value at a particular time. Normally it will be convenient to take this as time 0. The 
same point was illustrated in the currency example of Section 2.2, where we pointed out that 
the total amount in any one currency was easily converted to the total in any other by a single 
multiplication. 


Notation For the particular case of values at time 0 we will use a special symbol. Let 
ü(c; v) = Valo(e; v). 


The letter a is a standard actuarial symbol that is used to stand for annuity, another name for 
a sequence of periodic payments. See Section 2.14 for an explanation of the two dots. 


When there is only one discount function under consideration we often suppress the v and 
just write Val, (c) or á(c). 
We can express and calculate à conveniently by expressing it in vector form: 


äle) =v- c= vcl. (2.12) 


The second term is the (scalar) inner product of the two vectors. The third views the vectors 
v and c as 1 X N matrices, with the superscript T denoting a matrix transpose. 

Formulas (2.11) and (2.12) make it clear that calculating values is a linear operation, a 
fact we will often exploit. That is, 


Val,.(c + d) = Val,(c) + Val,(d), Val, (ae) = aVal;,(c), (2.13) 


for any cash flow vectors c and d, scalar a and duration k. 

We have been comparing the values of a sequence of cash flows with a single payment at 
a particular time. We often wish to compare the values of two sequences of cash flows. For 
this we have the following definition. 


Definition 2.5 Two cash flow vectors c and e are said to be actuarially equivalent with 
respect to the discount function v if, for some nonnegative integer n, 


Val,(c; v) = Val,(e; v). 
From (2.11), if the above holds for some n, it holds for all n. 


Take, for example, n = N. We see that a person taking the payments given by c and letting 
them accumulate according to the given discount function v will eventually be in exactly the 
same financial position as one taking the payments given by e. This is the meaning of actuarial 
equivalence. 

Many problems in actuarial mathematics reduce to the following. We are given a cash flow 
vector c, and another cash flow vector e that depends on some unknown parameters. We have 
to solve for the unknown parameters in order to make the two vectors actuarially equivalent. 
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We have already done a simple example of this in calculating Val, (c). In that case we wanted 
a single payment at time n that is actuarially equivalent to c. In another common application, 
a lender advances payments to a borrower, and the borrower must return the money by loan 
repayments. The lender is therefore trading one sequence of payments (the advances) for 
another (the repayments) and wants the two to be actuarially equivalent. We look at a simple 
example. 


Example 2.2 A lends B 20 units now and another 10 units at time 1. B promises to repay 
the loan by two payments, made at times 2 and time 3. The repayment at time 3 is to be twice 
as much as that at time 2. If A wishes to earn interest of 25% per period, what should these 
repayments be? 


Solution. Let K be the unknown payment at time 2. We want to find K so that the vectors 
c = (20, 10, 0, 0) and e = (0,0, 2K, 3K) are actuarially equivalent. There are many possible 
calculation methods. We could determine the vector v and use formula (2.12), which is 
essentially the best approach for a spreadsheet method, as we describe later in Section 2.14. 
For small problems to be done by hand calculation, it is convenient to make use of a time 
diagram, where we indicate the payments and the 1 year discount factors v(k, k + 1) which 
are all equal to (1.25)7! = 0.8. See Figure 2.1. Readers may find to useful to write down their 
own time diagrams for the examples in the book, if they are not given. 


Advances 20 10 0 0 

| | | | 
Time 0 1 2 3 
Repayments 0 0 K 2K 
v's 0.8 0.8 0.8 


Figure 2.1 Example 2.2 


Making use of formula (2.6), we calculate values at time 0 by multiplying each payment 
cy by v(k), which we calculate as the product of all the preceding discount factors v(i, i + 1). 
So, 


P.V. of advances = 20 + 10 (0.8) = 28 
P.V. of repayments — K[(0.82 + 2(0.85)] = 1.664K. 


Equating values to make the advances and repayments actuarially equivalent, K — 28/1.664 — 
16.83. The borrower pays 16.83 at time 2 and 33.66 at time 3. 


We conclude this section by describing a useful technique that we will call the replacement 
principle. Suppose we are given a cash flow vector and some subset of the entries (0, 1, ... , N). 
Take the value at time k of just those cash flows in the subset and then replace all entries in the 
subset by a single payment at time k equal to that value. This leaves a vector that is actuarially 
equivalent to the original. A formal derivation can be given by writing the vector as the sum 
of two vectors and using linearity. We will leave the details to the interested reader. 
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Benefits 2 4 -3 -5 
| | | | 
Time Ü i 2 3 
1 
Benefits 2 0 -3 0 
Time : : : 


Figure 2.2. Example 2.3 


The following is a simple, although artificial example. We will give more relevant uses of 
this principle later. See Figure 2.2. 


Example2.3 Let 
c = (2,4, 3, —5). 


Assume a constant interest rate of 0.25. Find the actuarially equivalent vector by applying the 
replacement principle with time k — 2 and the subset (1,3]. 


Solution. The value of c, and c4 at time 2 is 4(1.25) — 5(1.25)! = 1. Making the replacement, 
we obtain the vector (2, 0, —2, 0) that is actuarially equivalent to c, as can be verified by direct 
calculation. 


2.8 Vector notation 


We now introduce some convenient notation for vectors. In all examples, we will have 
a maximum duration N, and all of our vectors will be (N + 1)-dimensional, with entries 
indexed from 0 to N. However, we will often write a vector of lower dimension with the 
understanding that all subsequent entries will be zero. For example if N — 8, then (1, 3, 2) 
will denote the vector (1, 3, 2, 0, 0, 0, 0, 0, 0). For vectors c and d, written as above, we 
will write (c, d) to denote the vector consisting of the entries of c followed by those of 
d. For example if c = (2, 2), and d = (3, 7,4, 1) then (c, d) = (2,2,3, 7, 4, 1, 0, ...,0). (Note 
that the juxtaposition must come before filling in the ending zero entries.) For any number 
r and integer k we will write (r,) to denote the vector consisting of k entries of r. For 
example, (15, 25) = (1, 1, 1,2, 2, 2,2, 2,0, ..., 0). We let e! denote the standard ith basic vector, 
i = 0,1,2, ..., N, that is, a vector with an entry of 1 in position i and zeros elsewhere. 
For a given vector b = (bo, b4, ... , b,), let 


Ab = (bo, bi ES bo, b» — by, e sbn nl b, 704). 


That is, Ab is obtained from b by subtracting from each entry the immediate preceding entry, 
except for the entry in position 0, which remains the same. (Think of the entry in position —1 
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as 0). Note also that by our notational convention, b; = 0 for j > n, which accounts for the 
final entry of —5,. As a check, the sum of all the initis in Ab must be 0. (The reader should 
also note that our definition differs from some authors’ usage, whereby Ab(k) would equal 
bj, — by, rather than b, — bj. ., as we have defined it.) 

We use the symbol * to denote pointwise multiplication of vectors, that is, 


(aj, 5, s dg) * (bi, bo, iyu by) = (a, b1, a5 b», T , abn). 


The remainder of this section can be omitted on first reading. We develop an identity 
which is somewhat technical but which will be very useful in later chapters. 

For any vector b = (bo, b,,... , b,) and a discount function v, define a new vector Vb 
whose entry of index k is given by 


Vb, = by — v(k, K+ WD ya (2.14) 


So V is something like A except we subtract in the reverse order and discount the second 
term. Note that V depends on v and it could be denoted as V,, if there is any confusion. 
Our main result is that for any other vector € = (co, c4, ...) 


a(Vb * €) = a(b * Ac). (2.15) 
To see this, we just note that the left hand side is 
(bo — V(b, )co + (b, — vO, 2)by) cv) + (by — v, 3)b(3)) cy v(2) + +- 
Expanding and using the fact that v(k)v(k, k + 1) = v(k + 1), this equals 
boo + by (Cy — cg)v(1) + balca — cj)vQ2) + 


which is the right hand side. We can remember this formula as saying that when computing 
the present value of a pointwise product of one vector multiplied by A applied to a second 
vector, the A can slide off the second vector and become a V on the first. 


2.9 Regular pattern cash flows 


Before the introduction of computers and calculators, finding values of cash flows could 
involve a lengthy computation. Accordingly, the emphasis was not only on constant interest 
rates, but also on vectors where all nonzero cash flows were of the same amount, usually 
referred to as the case of level cash flows, or where the cash flows followed some regular 
pattern, such as entries increasing in arithmetic progression. Values in this case were fairly 
easy to compute by algebraic means. One would then try to handle more general cash flow 
vectors by expressing them in terms of those with regular patterns. Readers who have had 
previous experience with compound interest courses have no doubt seen such techniques. With 
modern computing methods there is little need for these methods for computing purposes, 
although they are sometimes useful for theoretical matters. We give a brief illustration of some 
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of the more useful formulas of this type. We assume throughout the section that the discount 
function is given by v(m) = v" for some constant v. 

Consider the vectors (1,) and j” = (1,2, ...,n — 1, n). Then 
ad, = 1v) ey" 
Multiplying by v, 
vü(1,) 9 v v? V". 
Subtracting the second equation from the first and dividing by 1 — v gives 


]-» 


1—v' 


ü(1,) = (2.16) 
Many readers will have seen this technique for summing a geometric progression. A similar 
trick handles the vector j by reducing to the level payment case. 


à(j")-1-42v4-3 +- ny", 


Multiplying by v, 
välj”) = v + 2v? + + nv”. 
Subtracting the second equation from the first gives 
(d= 9àj) = (Avtv + y) —m", 
and dividing by (1 — v), 


_ 4(,) - ny" 


a") (2.17) 


l-v 

There is an alternate way to derive (2.16) and (2.17), as well as many other similar 
formulas, which does not involve any series summation. It is based on the fact that a loan may 
be paid off by paying interest each period on the prior amounts advanced, and then paying 
off the total principal at the end. This is intuitively clear. A formal derivation will be given in 
Section 2.11. 

Suppose you receive a loan of 1 unit. You could repay i at the end of each period for n 
periods and then eventually repay the principal at time n. The repayment vector is therefore 
(0, i,i,...,i+ 1), which by the replacement principle is actuarially equivalent to (d,d,..., 
d,1)=d(1,,) + e", since a cash flow of d at any time k is actuarially equivalent to d(1 + i) = iat 
time k + 1. (In other words, if you pay interest at the beginning of the year, the appropriate rate 
is d rather than i.) Equating the present value of the advances and repayments, 1 = dà(1,) + v", 
which leads to (2.16). 

For the second formula, suppose you are to receive loan advances of 1 unit at the beginning 
of each year for n years. According to our scheme you will pay 1 unit of interest at time 1, 
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2 units of interest at time 2, etc. You then will repay the total principal of n at time n. The 
repayment vector is (0, i, 2i, ... (n — 1)i, ni + n). By replacing each i by a d one period earlier, 
this is actuarially equivalent to dj" 4- ne". Equating present values of this with the vector of 
loan advances gives a(1,,) = da(j”) + nv", which leads to (2.17). 


2.10 Balances and reserves 


2.10.1 Basic concepts 


This section will introduce one of the most fundamental actuarial concepts, that of a reserve. 

Suppose we enter into a financial transaction, represented by the cash flow vector c. At any 
future time k there are two fundamental quantities to compute. First, we would like to know 
the total amount accumulated from all payments of the transaction up to this point. Second, 
we would like to know how much money we will need at that time in order to discharge our 
future obligations under the transaction. We illustrate with a simple example. 


Example 2.4 Let 
€ = (3,6,1,2, —20), v(0, 1) = 0.6, v(1,2) = 0.5, v(2, 3) = 0.4, v(3, 4) = 0.5. 


How much do we have, and how much will we need at time 2, just before the 1-unit payment 
due at that time? 


See Figure 2.3, where we have inserted an arrow to indicate the time that values are taken. 


Payments 3 6 1 2 -20 
| | | m 
Time 0 1 2 3 4 
T 
v's 0.6 0.5 0.4 0.5 


Figure 2.3 Example 2.4 


Solution. The amount we have is clearly the 3 units paid to us at time 0, accumulated for two 
periods, and the 6 units paid at time 1 accumulated for one period, for a total amount of 


3v(2,0) + 6v2, 1) = 3v, 1)v(1,0) + 62,1) = + — = 22. 


3 6 
06x05 0.5 
For the second question, note first that we have an obligation to pay out 20 units at time 
4, and the amount we need at time 2 to provide for this is 20v(2, 4) = 20v(2, 3)v(3, 4) = 
20 x 0.4 x 0.5 = 4. We can offset this with the positive cash flows that we will acquire after 
time 2. The value at time 2 of the 1 unit due immediately at time 2 is just 1, and the value at 
time 2 of the 2 units payable at time 3 is just 2v(2, 3) = 0.8. The total needed to ensure we 
can meet our obligations is 4 — 1 — 0.8 = 2.2. 

Let us verify this directly. Suppose we have 2.2 units at time 2, just before the payment 
due at that time. The payment of 1 will come in to give us 3.2. This will accumulate to 
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(3.2/0.4) = 8 units at time 3. We will receive another 2 units at time 2 for a total of 10, and 
this will accumulate to the 20 units that we need at time 4 in order to meet our obligation at 
that time. 

We now introduce some notation and terminology to express these concepts in a gen- 
eral case. 


Notation Given any cash flow vector c and nonnegative integer k, let 
kE = (Co, €, 6g p0,...,0,  *e=(0,0,...,0,c4,Cp4 49 -> CN) 
so that 
c= ,c+*e. (2.18) 


For example, for the vector c in Example 2.4, >€ = (3, 6, 0, 0, 0) while 2e = (0,0, 1,2, —20). 

The idea is that ,c represents the past cash flows and "c the future cash flows when 
measured from time k. It is important to note that the payment at exact time k is by our 
convention taken as future. (We will see in Chapter 6 that this fits in with the usual treatment 
for life insurance contracts.) Note also that oc is the zero vector while °c is just c. 


Definition 2.6 For k = 0,1, ..., N, the balance at time k with respect to c and v, is defined 
by 


k-1 


B,(e; v) = Value v) = $, clk, j). 
j-0 


(We will, as before, suppress the v when there is no confusion.) The balance at time k is simply 
the accumulated amount at time k resulting from all the payments received up to that time, 
and answers the question of how much we will have. Note again that by our conventional 
treatment, balances are computed just before the payment at exact time k is made, so that the 
time k payment is not included in B}. 


Definition 2.7 Fork — 0,1, ..., N, the reserve at time k with respect to c and v is defined by 
N 
KV (c; v) = - Vale; v) = — DY jv j). 
j-k 


The reserve at time k is the negative of the value at time k of the future payments, and is equal 
to the amount we will need in order to meet future obligations. It is important to remember 
the negative sign, to reflect the fact that we are measuring the value of net obligations, that is, 
amounts to be paid out of the fund. 

Referring again to Example 2.5, we calculated that B5(c; v) = 22 and 5V(c; v) = 2.2. 
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The terminology and notation for reserves were borrowed from a particular life insurance 
application (which we will discuss in detail in Chapter 6). In keeping with this standard 
reserve notation, we have placed duration as a left rather than right subscript. We introduce 
the reserve concept now, in order to stress that its use is more general than just this particular 
life insurance application and can apply to any sequence of cash flows. One can think of 
this quantity as representing capital that must be set aside or ‘reserved’ for future use. For 
those familiar with accounting terminology, reserves, if positive, represent liabilities, which 
are amounts that we owe to other parties. 

It is possible of course for the reserve to be negative, which indicates that at time k the 
present value of the future amounts coming in for this transaction will exceed the present value 
of the future amounts to be paid out. In such a case, the absolute value of the reserve is a ‘receiv- 
able’, that is, an amount that is owed to us and that will be received through future payments. 


2.10.2 Relation between balances and reserves 
From (2.16) and the linearity of Val,, 


Val, (c) = B,(c) — ,V(c), Q.19) 


which says that the value of the transaction at any point is equal to the difference between 
what you actually have accumulated and what you need to set aside from that accumulation 
to meet future obligations. An important consequence of this is that 


for a zero-value cash flow vector c, B,(c) = ,V(€), (2.20) 


where a zero-value vector is one whose value is zero at all durations. Such vectors arise 
frequently. If c and d are actuarially equivalent, then for any duration k, 


Val,(c — d) = Val;(c) — Val; (d) = 0, 


so that c — d is a zero-value vector. 

For a typical example, suppose that an individual borrows money, and agrees to repay with 
a sequence of repayments. Consider the net cash flows from the borrower's point of view, 
that is, advances less repayments. If the advances and repayments are actuarially equivalent, 
these net cash flows form a zero-value vector. What is the outstanding balance on the loan at 
time k? This is the value of all the payments yet to be made, which by definition is just , V(c). 
In view of (2.18) it is also equal to B, (c). This is perfectly logical. It says that the amount 
still owing at time k is the value of all the money that been received, offset by the value of 
the repayments that have been made. Calculating the outstanding balance by means of the 
reserve is often called the prospective method since it looks to the future, while calculating by 
means of the balance is often called the retrospective method, since it looks to the past. For 
this reason, many authors refer to what we have called a balance as a retrospective reserve. 


Example 2.5 (Figure 2.4). An individual borrows 1000 now and another 2000 at the end 
of 1 year. This loan will be repaid by yearly payments for 10 years, beginning 5 years from 
the present. The yearly payment doubles after 5 years. The interest rate is 0.06 for the first 
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Advances | 1000 2000 0 0 00 0000000 0 0 

| SS SSS SS See 
Time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 
Repayments 0 0 0 0 0 K K KK K 2K 2K OK 2K 2K 


Figure 2.4 Example 2.5 


5 years and 0.07 thereafter. How much is still owing 8 years from now, prior to the payment 
at that time? How much is still owing after the payment at time 8? 


Solution. We must first determine the repayment amounts. Let K be the initial repayment. If 
c denotes the advances and d denotes repayments, then 


c = (1000, 2000), 
d = (0,0,0,0,0, K, K, K, K, K, 2K, 2K, 2K, 2K, 2K) = K(0s, 15,25), 
ü(c) = 2886.79, ^— à(d)— 7.953 26K, 


and equating, K — 362.97 (see Section 2.15 for the calculation details). So the first five 
instalments are each 362.97, and the second five are each 725.94. The amount owing at time 
8, just prior to the payment due on that date, can be calculated either as 


Bg(c — d) = Valg(e), where e = (1000, 2000, 03, 362.97.) 
Or as 
gV(c — d) = Valg(f), where f = (Og, 362.97, 725.945). 


In either case the amount is 3483.97. The outstanding balance at time 8, after the payment 
due at that time, is just 3483.97 — 362.97 — 3121.00. 

It is of interest to note that the lower payments at the beginning have the effect that the 
borrower is not repaying enough to handle the interest on the loan, so the amount owing is 
greater than the total amount borrowed. 


2.10.3 Prospective versus retrospective methods 


For the model that we have described so far, either the prospective or retrospective approach 
can be used to determine the amount owing on a loan. In pre-computer days, one often chose 
the one that allowed easier calculation, but with modern methods it makes little difference. 

There are, however, situations where one approach or the other dominates. The retrospec- 
tive approach must be used in cases where the time and amounts of loan repayments are not 
scheduled in advance, but can be made at the option of the borrower. 

Consider a variation on Example 2.5. Suppose the borrower had made payments of 300 at 
time 5, 400 at time 6 and 500 at time 7, and the payments after that were not yet determined. To 
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obtain the outstanding balance at time 8 we would necessarily use the retrospective method and 
calculate the balance. This is Valg(b), where b = (1000, 2000, 0, 0, 0, —300, —400, —500, 0). 

On the other hand, the prospective approach figures prominently in cases where the choice 
of an appropriate discount function may have changed since the transaction was originally 
entered into. In that case the reserve calculated with respect to a new discount function that 
is realistic at that time, could well give an amount that differs from that given by the balance 
as calculated retrospectively. After all, the equality of these two depended on the fact that the 
same discount function was used for both. We will not dwell too much on this complication, 
but would like to describe two very familiar situations. 

Suppose the borrower wishes to discharge a loan at some point prior to the scheduled 
completion. The natural amount to pay would be the outstanding balance on the loan at that 
date, as calculated by either method. Suppose, however, that the loan contract requires that 
payments be made as scheduled and does not permit you to repay early. This is often the 
case with home mortgages. Suppose, further, that interest rates have gone down since the loan 
was originally taken out. The lender will not welcome early repayment as these amounts will 
have to be reinvested at a lower rate. Borrowers may however be permitted to repay, if they 
include an extra amount as a penalty. What is really happening is that the lender is revaluing 
the reserve, by applying the new discount function to the future repayments. If interest rates 
decrease, the value of the function v increases. Assuming the normal case where entries in 
the vector ^c are nonpositive as they represent repayments, the negative of these values are 
nonnegative, and multiplying by the higher values of v and summing will lead to a higher 
reserve. The penalty represents the excess value of the reserve as calculated by an up-to-date 
discount function, over the original value. (It should be noted that, in practice, the penalty 
amounts are often determined by approximate formulas rather than an exact recalculation of 
the reserve.) 

As a second example, suppose you purchase a bond, which means that you are in the 
position of the lender. The nature of a bond is that this debt is assignable to another party by 
trading in the bond market. You sell the bond to someone who will then collect the repayments 
by the issuer. What is a fair price? The answer is obviously the outstanding balance, not as 
calculated originally, but rather with respect to the rates of interest applicable at the time of 
sale. As in the case above, lower interest rates will cause this outstanding balance to increase. 
Traders who buy bonds always hope that interest rates will fall, so the outstanding balances 
increase, and they can sell the bonds for a profit in the market. 


2.10.4 Recursion formulas 


If f is a function defined on the nonnegative integers, it is often useful to derive a recursion 
formula that expresses f(k + 1) in terms of f(k). Given an initial value f(0), one can then 
successively calculate values of f(k) for all k. This is known as a recursion formula. A recursion 
formula often leads to a difference formula that gives an expression for f(k + 1) — f (k). 

There are many recursion formulas in actuarial mathematics. For the most part, they can 
be derived from the basic recursions for balances and reserves that we present in this section. 

We will first express B,,, in terms of B,. For any cash flow vector c and duration K, it is 
clear from the definitions that 


pue = et cu e. 
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The value at time k of the second term on the right is c; and the value at time k of the first 
term on the right is B, (c). Invoking linearity and multiplying by v(k + 1, k) to get the value at 
time k + 1, gives the recursion 


By (0) = (Be) + cp) Wk + 1, k), (2.21) 


where we start the recursion with the initial value, B, = 0. 
Subtracting B, from each side and expressing v(k + 1,k) as 1 + i, yields the difference 
formula 


Buy (©) — By (©) = i (Bio) + cy) + cy. (2.22) 


Difference equations are not normally efficient for calculating numbers. Rather, they are 
useful for analyzing how quantities change from one period to another. For example, the 
above formula expresses the increase (or decrease if negative) in the balance over a period as 
a sum of two quantities, the interest earned plus the additional cash flow. 

Recall that our convention was to treat cash flows at time k as future with respect to k. 
This ties in well with insurance contracts, as we will see later. However, the other convention 
is normally used when dealing with loans. Accordingly, we define 


the accumulated amount at time k after the cash flow at time k is paid. We leave it to the reader 
to verify the corresponding recursion and difference equations as 


Buys (©) = By(ev(k + 1, K) + Cea, (2.23) 
Bye) — Bie) = i, By (©) + cya. (2.24) 


Rewriting (2.22), we get 
= Chyi = iByG) + [BLO - By ©), (2.25) 


which forms the basis for the usual loan amortization schedules. (These are schedules that 
show the balances at each duration and how they change.) The left hand side above is the 
amount repaid on the loan at time k + 1 (the negative of the negative cash flow) and the formula 
gives the split of this repayment into two parts. The first is the interest on the outstanding 
balance, and the second is the amount of principal reduction. 

Let us now turn to reserves. They satisfy exactly the same recursion relation as balances. 
That is, 


iai V(6) = &V(e) + cp) v(k + 1,k), (2.26) 


where we start the recursion with the initial value V(c) = —d(c). This is obvious in the case of 
zero-value vectors where reserves equal balances, but it holds in general, as seen for example 
from the relation, ktle =ke — c,e*, or alternatively derived from (2.19), substituting from 


(2.17) and using (2.11). 
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2.11 Time shifting and the splitting identity 


In this section we provide a background for following some classical formulas that appear in 
the actuarial literature. 
From (2.16), taking values at time 0 and applying (2.11), we derive 


á(c) = äge) + v) Val, (Ac). (2.27) 


We call this the splitting identity, since it splits the calculation of a present value into two 
parts, considering first those cash flows before time k, and then those after time k. It can also 
be obtained as a special case of the replacement principle. Replace all the cash flows after time 
k by the single payment of Val, (*c) at time k. In classical actuarial mathematics, the spitting 
identity was often used as a calculation tool when there was distinct change in the cash flow 
sequence or discount function occurring at time k. ( It can still be useful in this regard in the 
continuous case discussed in Chapter 8.) 

The remainder of this section deals mainly with notational issues, and could be deferred 
until we apply it later in Section 4.3.2. The splitting identity often appears in a somewhat 
different form, since the symbol Val, is not standard. In traditional actuarial notation, it is 
common to express all formulas in terms of á (or similar symbols which we introduce later). So 
the question is then, how do we write Val, in terms of à? This is quite simple when payments 
and interest rates are constant. In general case we must introduce some new notation for 
time shifting. 

Suppose we are at time k and we wish to consider this as the ‘new’ time 0. Given a cash 
flow vector c, define the cash flow vector c o k by 


(c x: 127 = Cu 


For example, if c = (2,2, 3, 4,5), then c o2 = (3,4, 5). In other words, cok simply gives the 
cash flows in order, but starting with the one at time k. Another way of looking at it is that cok 
is just ^c with the first k zeros removed. This notation was not needed with constant payment 
vectors, since when c is (1,), then cok = (1,  ,). 

Similarly, given a discount function v, define a new discount function v o k by 


vok(n,m) = v(n+k,m+k) 
so that 
vok(n) = v(k,k + n). 
Once again, the idea is simply that we are treating time k as time 0 and measuring time from 
that point. 
It is quite simple to calculate v o k from the factors for periods of length 1 as we did in 
(2.6). We multiply as before, but start with v(k, k + 1) rather than v(0, 1). 


An important point to notice is that, from (2.9) 


vok-v  ifinterest is constant, 
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which explains why this concept does not appear in classical works where interest is almost 
always taken as constant. 
We can now write 


N N N-k 
Val, (Ke) = X wk, i)c; = b? vok(i— kco k); = b» vok(j)(c o); = a(cok,vok), 
i=k i=k j=0 


(2.28) 


which is intuitively clear since we are starting at time k, treating it as a new time 0 and valuing 
the future payments. We can then write the splitting identity in its usual form 


ü(c; v) = a(,¢; v) + v(k)à(c o k; v o k). (2.29) 
Under constant interest, we can write this in an easier fashion as 
a(c) = à(,c) + v'á(c o k). (2.30) 


The splitting identity can sometimes be useful in determining the effect of changes in certain 
quantities. 


Example 2.6 Suppose that à(c) = 19.6 and á(;j9c) = 10. If v(9, 10) increases from 0.8 to 
0.81 while all other values of v(k, k + 1) remain the same what is the new value of á(c)? 


Solution. From (2.29), with k = 10, we know that v(0,9)(0.8) a(co 10, v0 10) = 9.6, so 
v(0, 9)(0.81) à(c o 10, v o 10) = 9.72 and the new value of á(c) is 19.72. 


*2.11 Change of discount function 
This section is somewhat technical. Its purpose is to derive a useful relationship that allows 
us to replace one discount function with another, which could be more convenient for the task 
at hand. 


Suppose we are given a cash flow vector c and a discount function v. Let v' be another 
discount function. Define a new cash flow vector c' whose kth entry is given by 


c, = e, + [V (k, + 1) — v, + 1)]Bp41 (6; v). (2.31) 
It follows that 


B,(c;v) = B(c'; v"), for all k. (2.32) 
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We prove (2.30) inductively. To simplify the notation, denote B;(c; v) by B,, Buc! ;v^) by Bi, 
v(k, k + 1) by v and v'(k, k + 1) by v'. At k = 0, the balances are both equal to 0. We will 
assume that B, = B, and show that B41 = B, |: 


/ / 
Bete, Bet kT y y vev 
Buy = eL py—— wi 
v v y 


Formula (2.30) for k + 1 follows after some minor algebraic manipulation. 

A corollary of this is that if ¢ has zero value with respect to v, then c' has zero value with 
respect to v’. This follows since a cash flow vector has zero value if and only if its balances 
eventually equal to 0. 

We will later apply this result to a key idea in life insurance. In this chapter we will give 
a formal proof of the assertion used in Section 2.9. 

Consider a loan for N periods made according to a discount function, which we will denote 
by v’ to tie in with the notation above. We let d’ and i’ refer to the corresponding discount and 


interest rates. Suppose the loan advances are given by vector € = (co, c4, ... , cy. 1). Let 
N-1 
s=} c r= (dco), di (Co + c1), dy (Co c1 €)... dy 5). 
k=0 


We want to show that c is actuarially equivalent to (r + se") with respect to v^. This will 
mean that we can repay any loan by paying interest each year on the prior advances, and 
then repaying the principal at the end. While this may be intuitively clear, it is important to 
verify it formally to ensure that our model captures the idea of interest the way one normally 
perceives it. Let v(k) = 1 for all k, the discount function for a constant interest rate of zero. It 
follows that 


k-1 


B,(c, v) = p Ck 


i=0 
so that the vector €' as given by (2.31) is just c — r. Invoking (2.32), this gives 
Valy(e — r — se"; v’) = By(c;v') — s = By(e;v) - s = 0, 


showing the desired actuarial equivalence. 


2.12 Internal rates of return 


In this section we take the point of view of an investor or lender, who is undergoing a 
transaction that involves investing funds, hoping to get back amounts that are greater in value 
than those put in. Let c = (cg, cy, ... , cy) denote the net cash flow vector of the transaction. We 
assume throughout that cy # 0. We are not given a discount function but rather want to find 
the constant interest rate that will make c a zero-value vector. In this section it is convenient 
to deal with balances that include the payment due at the time the balance is computed. That 
is, we want to consider B as defined above. 
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Definition 2.8 An internal rate of return (often abbreviated as i.r.r.) of the transaction is a 
number i in the interval (—1, oo) such that, for the discount function v(n) = (1 + i)", 


(i) By(e;v) = 0, and 
Gi) By(e;v) < 0 fork 2 0,1,..., N — 1. 


An i.r.r. is sometimes referred to as a yield rate. 


Remark The definition is unchanged if we use B in place of B, since B,,, = B,(1 + ij). 
Theorem 2.1 [fan i.r.r. of a transaction exists, it is unique. 


Proof. Suppose, to the contrary, that we have two such rates, i and i’, with i < i’. Let B, and 
B, denote the resulting balances. Suppose that c; is the first nonzero entry in c. We must have 
j< N since j = N would imply that By = cy z 0. We will now show by induction that, for 
k > j+ 1, we have 


B, > B! (2.33) 


If (2.33) holds for some index k then it will hold for k+ 1 since the nonpositivity of 
balances implies that 


Beat =B. +i) + Chyl >A +i) + Chy = Bey. 


Applying the above for k = j, and noting that c; = B = B « 0, verifies the initial step. So 


L1 


we have shown that By > B’, contradicting the fact that both rates satisfy (1). 


The reader familiar with previous literature on this subject may well be surprised at this 
theorem, since a great deal has been written on the non-uniqueness of the i.r.r. In our opinion 
this occurs from a faulty definition. The standard way of defining the i.r.r is to require only 
point (1) of Definition 2.8 and not (ii). The problem then reduces to finding the roots of the 
polynomial jx c, (1 + i) -*, and there may indeed be several. However, when balances 
become positive, it means that the status of the individual has changed from that of a lender 
to that of a borrower. The transaction therefore involves both borrowing and lending, and it 
is difficult to give any interpretation to the roots. After all, a lender is seeking a high i.r.r. 
whereas a borrower wants it to be low. We feel that the best approach is to give the definition 
above and concede that other transactions just do not possess an i.r.r. under this definition. 
Other means are needed for their analysis. 

Note however that no problem is presented by the usual type of transaction, whereby the 
loan advances, represented by negative entries, all precede the loan repayments, as given by 
positive entries. The balances will begin as negative. If they ever become positive, it could 
only be when the entries change to positive. But as the entries then remain positive, the 
balance could then never become 0. Therefore, in this case, if an interest rate i satisfies (i) it 
will necessarily be an i.r.r. 

The remaining material in this section can be omitted as it is not used in the remainder 
of the book. To handle cases where the i.r.r. does not exist by the definition given above, a 
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generalization of the definition was given by Teichroew et al. (1965a, 1965b), often referred 
to as the TRM method. Their approach was to recognize that when the status of a lender or 
an investor reverts to that of a borrower, as indicated by a positive balance, then that balance 
should accumulate at some rate r fixed in advance and not at the unknown, presumably higher 
rate that one is trying to solve for. The rate r could be, for example, the rate at which the 
individual could obtain financing. They then define a generalization of balances with respect to 
this rate r, building on the recursive formula (2.21). Let Bole; v; r) = Cp and define inductively 


Be ey = S BENDA FID teu HBG 57) < 0, 
HOO | Bev DO Deus if BG vr) > 0. 


Define the i.r.r. of c relative to r as the value of i for which By (c; v; r) = 0 when v(k) = (1 + D=. 
We will denote this quantity as i,. Note that if i is an i.r.r. by Definition 2.8, then it will be 
an i.r.r. with respect to r for any r. In this case balances are never positive and B,(c; v; r) will 
simply equal B,(c; v). 

The same type of inductive calculation as in the proof of Theorem 2.1 shows that as i 
increases, while k,r and a nonzero vector c are held fixed, the value of Bc; r; V) is strictly 
decreasing. There is therefore at most one i.r.r. relative to r. Moreover, there will always be 
one except when either Byte; v3r) < Oor By(e; v; r) > 0 for all i in (—1, co). Defining i, to be 
—1 in the first case and oo in the second case, we can state the following theorem. 


Theorem 2.2. For any nonzero vector c and any r > —1, there is a unique value of i, in the 
interval [0, co]. 


This shows that we do obtain a unique i.r.r. for any transaction after we first postulate the 
deposit rate r. This gives rise to a simple criterion to decide if it is worthwhile to enter into a 
transaction. The general rule is that the transaction is worthwhile if i, > r, and not worthwhile 
if i, <r. 


*2.13 Forward prices and term structure 


An interesting example of a discount function is furnished by the forward prices on risk-free 
zero-coupon bonds. A zero-coupon bond is a financial instrument that promises to pay a 
fixed sum at some future date, known as the maturity date, and which makes no intervening 
payments (coupons) before that time. By risk-free we mean something like a government 
bond, where we can safely assume that the bond is sure to be redeemed with no chance 
of default. 

We first discuss the general idea of a forward contract. This is an agreement between two 
parties, whereby one party, the seller, agrees to deliver a certain specified asset at a specified 
future time to the other party, the buyer. At the time of delivery, the buyer, will remit a sum 
of money that is agreed upon at the time when the contract is entered into and is independent 
of the prevailing price at the time of exchange. The price agreed upon at the contract date is 
known as the forward price of the contract. The buyer is often referred to as taking a long 
position and the seller is referred to as taking a short position. 

There are various reasons for such an arrangement. The buyer may be someone who needs 
the particular item at some future date, and wishes to lock in a price now, to protect themselves 


FORWARD PRICES AND TERM STRUCTURE 31 


from a future increase in the price. Alternatively, the buyer may be a speculator with no need 
of the item, but who predicts that prices will rise, thereby giving them a profit as they can buy 
the asset at the delivery date and immediately sell it for a higher price. Similarly, the seller 
may be one who owns the asset, wishes to dispose of it at a future date, and wants to lock in 
the amount they will receive as protection against falling prices. Alternatively, the seller may 
be a speculator who predicts that prices will fall, and hopes to profit by buying the item at the 
delivery date for less than they have agreed to sell it for. 

Let v(0, f) = (ft) be the price at time 0 of a 1-unit, zero-coupon, risk-free bond maturing 
at time f. For any s < t, let V(s, t) be the forward price for a 1-unit zero-coupon bond, maturing 
at time f, where the delivery date is time s. Now what will these forward prices be? It turns 
out that under a certain natural assumption they are determined from the time zero prices by 
the rule 


Vs, f) = V(0)/v(s) (2.34) 


which implies immediately that v is a discount function. (Note that in the notation of Sec- 
tion 2.11 forward prices as determined from time s are given by the time shifted discount 
function v o s.) 

The assumption made is the no arbitrage hypothesis, which is a major concept in modern 
day financial economics. An arbitrage opportunity is one where a party can make a sure 
profit by buying and selling certain financial assets with no risk of a loss. The hypothesis 
in question says that in an environment when all parties have perfect information, arbitrage 
opportunities cannot persist. This is a simple result of supply and demand laws in economics. 
The arbitrage opportunities occur when some assets are overvalued and others are undervalued. 
The argument is that if such an opportunity should ever rise, individuals will, in an attempt 
to take advantage of this, rush to buy the undervalued assets, thereby raising their prices, and 
rush to sell the overvalued assets, thereby lowering their prices, which restores equilibrium 
and eliminates the arbitrage. 

We also make some other idealized assumptions. One is that an arbitrary number of units 
of a bond can be bought or sold at any time. This includes the possibility of short selling 
whereby an individual can sell an asset that they do not own by borrowing the asset from 
another party. They plan to acquire the asset at a later date (hopefully at a lower price than 
they sold it for) for return to the lender. Another assumption, is that buying or selling does not 
involve any transaction costs such as commissions. 

To verify our claim above we will show by an example, that in the absence of Equation 
(2.34) an arbitrage opportunity would arise. Suppose, for example, that v(10) = 0.75 and 
$(5) = 0.90 but 9(5, 10) = 0.8 < v(10)/v(5). An individual could sell a 1-unit bond maturing 
at time 10, receiving 0.75 and then (1) use the proceeds to buy 5/6 of a 1-unit bond maturing at 
time 5, and (ii) take a long position on a forward contract for a 1-unit bond maturing at time 10 
with a delivery date of time 5. At time 5, the individual receives 5/6 from the maturing 5-year 
bond, uses 4/5 of that to settle the forward contract, and now owns a 1-unit bond maturing at 
time 10, which they use to settle the short sale. A sure profit of 5/6 — 4/5 is made at time 5. 

The reader is invited to find a corresponding example of an arbitrage opportunity in the 
case that v(s, f) > v(t)/v(s) and then to provide a general proof. 

The analysis above indicates that for an individual or corporation whose investment envi- 
ronment consists of risk-free zero-coupon bonds, the forward prices constitute a reasonable 
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choice of discount function. For example, the individual could ensure (under our assumptions) 
that a payment received at time s would accumulate to v(s, f) at a subsequent time ¢ by entering 
into a suitable forward contract. 

We conclude this section by introducing some conventional terminology. We have given 
the above analysis in terms of prices, which seems the most convenient way, but typically the 
economic and finance literature refers to rates instead of prices. Assume we are given prices 
of risk-free zero-coupon bonds for all maturities. 


Definition 2.9 The spot rate of interest y, is the yield rate for the bond maturing at time f. 
That is 


i) =A +y)” 
so that 


y - vq -1. (2.35) 


Definition 2.10 For s < t, the forward rate of interest f(s, t) is the yield rate earned for the 
bond maturing at time f and acquired on a forward contract with delivery at time s. That is 


Ws, t) = [1 + f(s, Oh" 


so that from (2.34), 


E 1/(s-t) 
f(s,t) = (£) =]; (2.36) 
Vs) 

The relationship between the various maturity dates and the spot rates is known as the 
term structure of interest rates. A graph which shows values of y, for various values of t is 
known as a yield curve. Examples of yield curves can be found in the financial and business 
sections of many daily newspapers, as well as being freely available online. The study of the 
various possible shapes of yield curves is an important topic for economists which is beyond 
the scope of this text. 

As a final word, note that although we have discussed spot and forward rates in the context 
of risk-free bonds, they can be defined, using formulas (2.35) and (2.36) for a general discount 
function. 


Example 2.7 Given the spot rates y; = 0.05 and y; = 0.07, find the forward rate f(2, 5). 


Solution. 


-5 \ -1/3 
f2, 5) = (1555) — 1 = .084. 
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2.14 Standard notation and terminology 


There is a system of standard international actuarial notation that can seem quite complex at 
first. One goal in this book is to simplify notation as much as possible. In order to read this 
book, it will not be necessary to learn any of this system other than that which is introduced in 
the text. However, the student who wishes to read other actuarial literature, or write actuarial 
examinations, or pursue an actuarial career, will be expected to be familiar with the full extent 
of the system. Accordingly, we will, at the end of each relevant chapter, review the standard 
notation, indicating how it ties in with the notation in the main text. 


2.44.4 Standard notation for cash flows discounted with interest 


The vector notation is peculiar to this book. In the standard notation the same basic symbol a 
is used for the present value of a cash flow stream, but the particular vector of cash flows is 
indicated by embellishments of this symbol. A prime example is 


aq = à(0, 1,). 


A few comments are in order. The ‘angle’ around the n is intended to signify a duration of 
time, as opposed to an age (which will be encountered in the chapters on life annuities and 
insurances). The vector above arises frequently. It is usual for a loan contract to stipulate that 
the first repayment is made one period after receiving the loan. One does not usually make a 
repayment as of the loan date, since that would in effect just mean you were getting a smaller 
loan. Therefore, the simplest unadorned symbol was reserved for the vector with first entry 
equal to 0. When there is a payment at time zero, standard notation denotes this by placing 
two dots above the a. That is, 


ay = à(1l,). 


The former case is termed an immediate annuity, and the latter a due annuity. The terminology 
is a bit unusual since the immediate annuity does not start immediately but after one period. 
The name arose as a contrast to the general deferred annuity where there are several zero 
entries at the beginning. Once we move away from level payments, the distinction between 
due and immediate annuities no longer applies. From our point of view, there is always a 
payment at time zero although it may be of zero amount. To avoid conflicts with standard 
notation we use the two dots as a general symbol. 

Another traditional way of looking at the difference between due and immediate was that 
the former applied when payments were made at the beginning of the year, and the latter when 
payments were made at the end of the year. Once again, these distinctions no longer apply 
when payments are not level, since the beginning of 1 year is the end of the preceding one. 
(We will, however, have need to refer to this point again, when we discuss fractional period 
payments in Chapter 7.) 

Some other symbols are 


lam = à(0, 1,2, ... ,n,0, ... , 0), 
Tam = à(1,2, ... n, 0, ... ,0), 
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using the vector that we denoted by j” in Section 2.9, 


Dag, = ä(0,n, (n — 1), (n — 2), ..., 1,0, ...,0), 
Dän = à(n,(n — 1), (n — 2), ... 1,0, ...,0). 


These are simple enough to remember from the fact that J indicates an increasing sequence 
of payments and D indicates a decreasing sequence. 
The symbol s is used to indicate an accumulated value. So, for example, 


Sm = Val,,(0, 1,) = y(n, Oams 
Sm = Val,(1,) = v(n, 0)aq- 


Note that in both cases the values are at time n. For the unadorned symbol this is at the date of 
the last payment. For the double-dotted symbol, time n is one period after the last payment, 
since the first payment was at time 0. 


2.14.2 New notation 


A reader who has already spend a great deal of effort in mastering the standard notation may 
have felt some dismay in having to learn yet more notation introduced in this chapter. We feel, 
however, that the devices introduced here are useful. There are three main innovations: the 
vector notation; the symbol Val; and and the time-shifting notation. We consider each in turn. 

The vector notation allows us to conveniently refer to an arbitrary sequence of cash flows 
in a systematic way. The traditional notation uses special embellishments for each particular 
sequence, and even so, is restricted to a small number of cases, such as payments constant 
over some period, or payments in arithmetic progression. 

The system of writing vectors is useful since it tells you exactly how to enter a cash flow 
vector into a column of a spreadsheet. For example, faced with the vector (159, 29), one 
simply enters a 1 in the first column, copies it down for 20 rows, then enters a 2, and copies 
it down for 10 rows. 

The standard notation allows you to specify values only at the beginning or end of a trans- 
action, and it is often useful to denote values at an intermediate point, through the use of Val,. 

Having introduced the vector notation, it is convenient to introduce the lower and upper 
subscripts to denote past and future, respectively. Such a device was not needed in the classical 
literature that dealt with level cash flows and interest rates. For example, the vector ,1, is just 
1,_,- Similarly, a symbol like our o for time shifting could easily be avoided since, as noted, 
vok = v. Moreover, 1, o h is just equal to 1,. ;,. There are, however, examples in the literature 
where it is necessary to refer to general time-shifted cash flows, and ad hoc symbols are used 
(see Bowers et al., 1997, p. 519). We feel it is much preferable to have a consistent notation 
to handle these. 


2.45 Spreadsheet calculations 


Here and at the end of Chapters 4, 5, 6, 9 and 10 we will describe spreadsheets for doing the 
basic calculations pertinent to the material of the respective chapter. Our descriptions will refer 
in particular to Microsoft Excel®, and we assume the reader is reasonably acquainted with 
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this application. The basic ideas should be easily adaptable to other formats as well. Readers 
can reproduce our spreadsheets as given, or they may wish to come up with their own. 

Our general idea is to enter various functions of the duration k in columns starting with 
row 10 for k = 0. The cells above can then be used for headings or information. 

We put the duration k in column A. That is, we insert 0 in cell A10, 1 in cell A11, etc. 
This is done by putting the formula =A10+1 in cell A11 and copying down to cell A10+N. 

The usual way of specifying the discount function is through the one-period interest rates 
i,. We enter these in column B, with i, being inserted into cell B10+k. We then calculate 
the vector of v(k) values in column C. Insert 1 in cell C10 and the recursive formula (2.7) is 
entered into cell C11 as =C10/(1+B10). This is copied down column C to cell C104-N. 

We enter the cash flows in column D. The cash flow c, is put into cell D10+k. The value 
ä(c, v) is then calculated in cell D8 through the formula 


= SUMPRODUCT($C10:$C10 + N*D10:D10 + N) 


where we substitute for the particular value of N. As a check, take a constant interest rate of 
0.06, and the vector (1,6, 25). The answer is 12.7883. By copying cell D8 to the right, we can 
do calculations for several different cash flow vectors. 

Values at time k are easily obtained by dividing a(c) by v(k), which is in cell C(10+k). 


Notes and references 


This chapter is not intended to provide an exhaustive treatment of the mathematical theory of 
interest. The goal was to provide that portion of the subject that is needed for the remainder of 
the book. Some additional materials will also be given in Chapters 7 and 8. Readers interested 
in a more detailed account can consult Broverman (2010) or Daniel & Vaaler (2009). 
Additional work on internal rates of return can be found in Promislow (1980, 1997) and 
Teichroew et al. (1965a, 1965b). The concept of the generalized i.r.r. is due to the latter. 


Exercises 


Type A exercises 
2.1 You are given a discount function v where v(1, 3) = 0.9, v(3,6) = 0.8, v(8,6) = 1.2 


(a) How much must you invest at time 1, in order to accumulate 10 at time 8? 
(b) If you invest 100 at time 3, how much will have accumulated by time 8? 


2.2 If v(t) = 2%, and you are given cash flow vectors c = (1,2, 3) and e = (2, K, 1), find 
K so that c and e are actuarially equivalent with respect to v. 


2.3 You are given interest rates ij = i; = 0.25, in = iz = 1. You have entered into business 
transaction where you will receive 2 at time 0, 5 at time 3 and 10 at time 4, in return 
for a payment by you of 3 at time 2. In place of all these cash flows you are offered 
a single payment made to you at time |. What is the smallest payment you would 
accept? 
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2.4 You are given rates of discount as follows: d, = 1/3, for k = 0,1,2, and d, = 1/4, 
for k = 3,4. The vector € = (65, 93, 12). Find (a) the discount vector v, (b) a(c) and 
(c) Val, (c). 


2.5 Let c= (1,2,4,—3,8,—12). Suppose v(0,1)=v(1,2) 20.8, v(2,3) = v(3,4) = 
0.75, (4, 5) = 0.5. Find (a) 3V, (b) B4. 


2.6 A discount function satisfies 


«9-r*h-5]. k=0,1,2,...,5. 
For the vector e = (1, —2, 4, 3, —3, —5) find (a) 3V(€), (b) B3(c). 


2.7 Given that &V(c)- 100,cg = 60, cg = 70, v(8,9) = 0.8, v(9, 10) = 0.75, v(0, 10) = 
0.5, and that a(c) = 40, find (a) ;9V(c), (b) B,o(c). 


2.8 Given the vector c = (1,25, 74) and a discount function v satisfying v(k) = 1 — 
k/10, k = 0,1,...,5, find vo3(&) for k = 0, 1,2; the vectors 4c and c o3; and the 
present values á(c; v), á(4€; v), á(c o3; v o 3). Verify that formula (2.29) holds. 


Type B exercises 
2.9 Define a two variable function for s, t > 0 by 


_fdte-s)y!, ifs<t, 
DL Erde ifs >t. 


Show that v is not a discount function. 


2.10 Show that, given any positive-valued function g of one variable, 


g(t) 


v(s, f) = —— 


g(s) 


defines a discount function. 


2.11 Suppose that v; and v, are two discount functions. Are either of the functions v; v, or 
vı + v, discount functions? These are defined by 


V1Vo(s, f) = vı (s, tv2 (s, t), 
[vi v5]. £) = vy (s, £) + vos, t). 
2.12 Assume constant interest. 
(a) Show that a(0, 1,) = à(1,) — 1 + v". 


(b) Show that à(0, 1,) = (1 — v")/i. Try to do this from part (a) and also directly from 
(2.14). 


2.13 


2.14 


2.19 


*2.20 
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Let k” denote the vector (n, n — 1,n — 2, ...,2, 1) Show that 
n — á(0,1,) 
a(k") at eas 
in three different ways: 
(a) From first principles, as in the first derivation of (2.17); 


(b) By making use of a linear relationship between the vectors (1,,), K” and the vector 
j” of Section 2.9; 


(c) By using the loan-interest argument given at the end of Section 2.9. 


Suppose that v(2, 7) = 0.5, v(2, 8) = 0.4, v(2, 9) = 0.3. Find a vector d actuarially 
equivalent to c = (1,3,5,2,9, 10, 6,4, 8,3) such that d; = c; for i< 7 and d; = 0 
fori » 7. 


A loan of 2300 is to be repaid by n yearly payments of 230 beginning at time 5. The 
borrower is given the option of repaying only 115 at time 5, but must then pay 240 in 
each of the subsequent payments. Find v(5). (An exact numerical answer is required. 
It should not be a function of n.) 


A loan of 20 000, made at an interest rate of 696, is to be repaid by level yearly 
payments for 10 years, beginning | year after the loan is advanced. Just before 
making the seventh repayment, the borrower wishes to repay the entire loan. 


(a) If interest rates remain unchanged, what is the outstanding balance? 


(b) Suppose interest rates have dropped to 5%. How much will the borrower have to 
pay if the lender uses the lower interest rate to calculate the outstanding balance? 


A person has 1000 now and plans to invest it for 5 years. He is trying to decide between 
two alternatives. The first is to buy a bond that matures in 5 years. The second is to 
buy a bond that matures in 10 years and sell it at the end of 5 years. Assume that in 
both cases the bonds have no payments before maturity and can be purchased today 
at an interest rate of 6%. How much better or worse off is the individual at the end of 
5 years if he chooses the second alternative instead of the first, assuming that at time 
5 the interest rate for this class of bonds is (a) 4%, (b) 7%? 


For a certain cash flow vector c, cy = 1, ¢ = 5,a(c) = 15. If the discount function 
is changed so that v(1, 2) is decreased by 0.1, while all other of values of v(n, n + 1) 
remain unchanged, then à(c) decreases by 2.4. One the other hand, if v(0, 1) is 
decreased by 0.1 while all other values of v(n, n + 1) remain unchanged, then à(c) 
decreases by 2. Find v(0, 1) and v(1, 2). 


cand d are actuarially equivalent vectors such that c is constant and d is nondecreasing. 
Show that ,V(c — d) > 0, fork = 0,1, .... N. 


For the cash flow vector (—1,3, —2), 
(a) show that the i.r.r. does not exist; 


(b) find i, as a function of r. 
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*2.21 Suppose that for a cash flow vector c, the number r satisfies By(c; v) = 0, where 
v(n) = (1 +r)". Show that i, = r. 


*2.22 Show that when interest is constant at rate r, then a(c) > 0 if and only if i, > r. (This 
shows that the criteria for determining if it is worthwhile to enter into a transaction 
given after Theorem 2.2 are in accordance with the remarks in Section 2.1.) 


*2.23 Suppose that current interest yields on risk-free bonds are 4% for a 5-year bond and 
5% for a 10-year bond. Calculate the forward price to be paid at the end of 5 years for 
a zero-coupon bond that pays 1000 units, 10 years from today. Suppose that instead 
of the number you just calculated, this forward price is 800. Illustrate how you would 
make a sure profit (under the idealized conditions given in Section 2.12). 


*2.24 The current price of a l-unit zero-coupon bond maturing in 10 years is 0.610. The 
forward rate f(6, 10) = 0.06. Find the spot rate yg. 


Spreadsheet exercises 


2.25 A loan contract involves advances of 10 000 initially, 20 000 one year later, and 30 
000 one year after that. This is to repaid by 20 yearly instalments beginning at time 
3. The payments reduce by 5% each year (so, for example, if the first payment was 
1000, the second would be 950, the third would be 902.50). Interest rates are 6% for 
the first 5 years, 7% for the next 5 years, and 8% after that. Find all payments and 
the outstanding balances at the end of each year, until the loan is fully discharged. 
Balances should be calculated after the payment due at the particular time is made. 


The life table 


3.1 Basic definitions 


For the actuary working in the life insurance field, a major objective is to estimate the mortality 
pattern which will be exhibited by a group of individuals. A basic device for accomplishing 
this is known as a life table. (It is also known as a mortality table — an interesting example of 
a word and its opposite being used interchangeably.) 

Let ĉo be an arbitrary number, usually taken to be a round figure such as 100 000. Suppose 
we start with a group of fọ newly born lives. We would like to predict how many of these 
individuals will still be alive at any given time in the future. Of course, we cannot expect to 
compute this exactly, but we can hope to arrive at a close estimate if we have sufficiently good 
statistics. In the first part of this book we will make the assumption that we can indeed arrive 
at exact figures. This is in keeping with the concept of a deterministic model introduced in 
Chapter 1. In Part IL, we introduce the stochastic model for mortality, where we investigate 
the more realistic assumption that the quantities we want are random variables. Let 7, be the 
number of those original lives aged 0 who will still be alive at age x, and let d, be the number 
of those original lives aged 0 who die between the ages of x and x + 1. The basic relationship 
between these quantities is 


Ci, = 0, d, (3.1) 

A life table is a tabulation of Z, and d, where x is a nonnegative integer. The following 
is an example of a portion of a life table (this is an illustration only, and the figures are not 
intended to be realistic): 
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x A d 


ts ET 
0 100 000 2000 
1 98 000 1500 
2 96 500 1000 
3 95 500 900 
@ 0 


The table will end at some age, traditionally denoted by œ, such that 7, = 0. This is the 
limiting age of the table, and denotes the first age at which all of the original group will have 
died. The actual value of œ will vary with the particular life table, but it is typically taken to 
be around 110 or higher. 


3.20 Probabilities 


Although we assume we can predict Z, exactly, there is still randomness in our model, 
since it is not known whether or not any given individual will be among the survivors at a 
particular point of time. It is convenient to introduce some elementary probabilistic notions. 
For nonnegative integers n and x, let 


on 
= : 3.2 
EFI 3.2) 
What is the meaning of this term? Consider the 7, survivors age x. Out of this group, fyn will 


survive to age x + n. The quotient then gives us the probability that a person aged x, hereafter 
denoted just by the symbol (x), will be alive at age x + n. Let 


Lu CTER, (3.3) 


This gives us the probability that (x) will die between the ages of x and x + n. It is clear that 


ndx = 1 —nPx: (3.4) 


As an example, in the table given above we would have pọ = 965/1000, >q; = 25/980. 
Since a left subscript of ‘1’ occurs frequently it is omitted for notational convenience. 
That is, p, denotes ;p,, and q, denotes ,q,. The quantity q, is often referred to as the mortality 
rate at age x. 
What is the probability that (x) will die between the ages of x + n and x + n + k? This is a 
quantity which we will use frequently. There are three main ways of expressing it: 


Pn z Oxvtntk 


f E 


X 


(3.5a) 


CONSTRUCTING THE LIFE TABLE FROM THE VALUES OF q, 41 
or 


nPx T n+kPx (3.5b) 


or 


nPx kx-n- (3.5c) 


The reader should verify, by substituting values of Z, that all three expressions are equal. 
They can each be explained intuitively. Consider the first. The numerator is the number of 
people living at age x +n, less the number living at age x +n +k. This difference must be 
the number of people who died between the two ages. Dividing by the number of people that 
we start with will give us the required probability. In the second expression we express this 
quantity as the probability that (x) will live n years, but will not live n + k years. In the third 
expression we consider two stages. To die between the specified ages, (x) must first live to age 
x+n. The individual, then being age x + n, must die within the next k years. We will have 
occasion to use all three of these expressions, choosing the one which is most convenient for 
the purpose at hand. 
Another useful identity, which we will refer to as the multiplication rule, is 


n-kPx = nPx  kPx+n> (3.6) 


for all nonnegative integers n, k and x. It can be verified directly from (3.2). Intuitively, it says 
that in order for (x) to live n + k years, the individual must first live n years, and then, being 
age x + n, must live another k years. 


33 Constructing the life table from the values of q, 


The life table is constructed in practice by first obtaining the values of q, forx = 0,1, ...,0 — 1. 
Obtaining these values is a statistical problem which we will not discuss in detail. It is basically 
done by carrying out a study in which we observe how long people of different ages will live. 
For example, if we observe a group of 1000 people of exact age 50 and 10 of them die within 
1 year, then we could estimate gs, as 0.01. This of course is an extreme simplification and 
the process is much more complicated. It is not practical to gather a group of people exactly 
age 50 at one point of time, and then to observe them for an entire year. In practice, people 
will enter the study at various times, and leave for reasons other than death. In addition, we 
must achieve consistency between values at different ages. The statistical subject known as 
survival analysis deals with these problems. 

In this book we will take the values of q, as given. The life table can then be constructed 
inductively, starting with 7), from the formulas 


d, = Ü de Ove] = ey ze dy, (3.7) 


which follow immediately from (3.1) and (3.3). We will see, however, that it is not usually 
necessary to actually calculate f, and d,. In practice, life tables are specified by just giving 
the values of g,, which is sufficient for the necessary computations. The advantage of the 
traditional form lies mainly in its intuitive appeal, rather than its use as a calculating tool. 
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3.4 Life expectancy 


Life expectancy is one of the most frequently quoted actuarial concepts. Probably for this 
reason it is often used incorrectly, as we will explain below. A basic question is the following. 
How long can a person age x expect to live? Of course there is much variety in the future 
lifetime of various individuals of the same age. Some will live for several years, and some 
will die immediately, but we can attempt to arrive at some sort of average figure. One 
approach would be to take a large number of people age x and observe them until all have 
died. We could then compute the total future lifetime of all these individuals. Dividing by 
the number of people in the original group would give an estimate of the desired average. 
For a drastically oversimplified example, take three people exactly age 60. Suppose one dies 
at age 62, another at age 725 and the third at age 91 n The total future lifetime would be 


2-4 127 + 31i = 452. Dividing by 3 we could estimate that a person age 60 could expect 


to live on average another 15: years. Of course to be statistically accurate we would need 
many more than three people. Moreover, the length of time needed for such a study makes 
it completely impractical. Remarkably however, once we have the life table, we can obtain 
the figure directly, without carrying out the observations. To see this, we look at another 
approach to obtaining total future lifetime. Suppose we start with Z, people age x. After 1 
year, there will Z,,, survivors who would have each contributed 1 year of lifetime to this 
total. At the end of the 2 years there will Z,,» survivors who would have each contributed 
another year to the total. Continuing in this way, we can estimate the total future lifetime of all 
lives as 


OH + Ü 2 zs O43 Ast ce 55 


and, dividing through by Z,, we obtain the quantity 


ox- p A @-x-1 
+, 
e, = 2 "à - 2 Pry (3.8) 


The quantity e, is known as the curtate life expectancy or curtate expectation of life at age 
x. The word curtate means reduced or truncated, reflecting the fact that this is not exactly the 
quantity that we want. We have cheated a little in our alternate measurement scheme, for it 
measures only whole future years of lifetime and ignores the fraction of the year lived in the 
year of death. In our illustration above, for example, under the alternate counting method, the 
60-year-old who died at age 723 would be credited with only 12 years of total lifetime rather 


than the actual 125. The person who died at 912 would be credited with only 31 years rather 


than the actual 314. We are undercounting between 0 and 1 years for each individual, and 
it seems reasonable to take this to be one-half on average. The true life expectancy, usually 
referred to as the complete life expectancy at age x and denoted by €, is given approximately by 


We will give a more formal treatment of ê, in Chapter 8 and again in Chapter 15. 
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There is a simple recursion formula to compute e, for all values of x. From the second 
formula in (3.8), 


ex = Py Py t aPy Fo + w—x-1Px 
= py T Peay F 2Pypy Foe o-x—-2Px+1) (3.9) 
= pl + €x41)- 


The second line is obtained by using the multiplication rule (3.6). Note that this is a 
backward recursion formula as it gives the value of the function in terms of the next higher 
argument. The recursion is then started from the initial value of e,, = 0. 

It is instructive to give an intuitive explanation of (3.9). To live any whole number of years 
in the future, (x) must first live to age x + 1, as reflected by the factor p,. The individual will 
then have completed 1 year of lifetime, and in addition, being now age x + 1, will complete, 
on average, the expected number of future whole years for a person of that age. 

The reader should note carefully that life expectancy is a function of age. For each age x, 
the life expectancy at that age gives the average number of future years that (x) will live. A 
major source of misquoting is to express this as a single figure rather than a function. One can 
often find statements in newspapers or similar sources, stating something like ‘life expectancy 
has increased from 75.3 to 75.8 years’. The writer is invariably referring to the life expectancy 
at age 0 only. This is certainly of interest, but it conveys somewhat limited information. A 
person already aged 80 who wishes to estimate how much longer he/she can expect to live is 
not helped by a statement which claims that newborn lives live on average to age 75.8. 

The reader should also note that life expectancy is the average duration and not the average 
age a person can expect to live to. We say, for example, that the life expectancy at age 50 is 
31.2, meaning that on average a person age 50 can expect to live to age 81.2. There is often 
confusion on this point, which is again a result of the tendency to report only the age 0 figure 
where duration and age are the same. 

There are many other quantities obtainable from the life table which are of interest, but 
only one in particular that we will discuss here. Sometimes we may be interested in the average 
duration lived by (x) over the next n years, where n is some fixed duration. The quantity 


n 


l etk n 
X A = È s (3.10) 


k=1 


is known as the curtate n-year temporary life expectancy at age x. The word temporary comes 
from an analogy with life annuities which we discuss in the next chapter. It gives us the 
expected complete number of years lived over the next n years by people now age x. This is 
what we would compute if we repeated the alternate measurement system described above, 
but ended the observations after n years. To adjust this for the undercounting in the year 
of death requires some care. Those who lived to age x + n will have contributed the correct 
total of n years, and therefore it is only the (Z, — 7,,,,) people who died during the n-year 
period who must be considered in the adjustment. To get a more accurate n-year temporary 
life expectancy at age x we add to the quantity in (3.10) not + but rather ie x7 Cus, to 
get an approximation to the complete n-year temporary life expectancy at age x of 


n 
1 

> kPx T zni 

k=1 2 
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3.5 Choice of life tables 


The life table reflects the fact that age is the major determinant of future mortality. There are of 
course several other factors that affect future lifetime, such as gender, health status, lifestyle, 
and geographical location. In practice, the effect of these factors is handled by producing 
several different life tables. One confines the mortality study that produces the table to a 
particular group of people, namely those with the characteristic that you want to separate out. 
The following are a few of the more important distinctions made in practice. 

It is observed, for reasons that nobody has fully explained, that females live longer than 
males. For the middle range of ages, it is typical for the life expectancy of a female to be from 
5 to 7 years more than that of a male of the same age. To reflect this, it is usual to produce 
separate male and female life tables. 

In recent years, there has been overwhelming statistical evidence to show the dangers of 
smoking. This has led insurance companies to construct different life tables for smokers and 
nonsmokers. 

The choice of life table will also depend heavily on the type of contract that is being 
sold. The life tables produced for the general population from census data are not suitable for 
insurance purposes. People accepted for life insurance policies are usually screened by the 
insurance company to make sure they are in reasonable health. They can expect to live longer 
than a person of the same age taken from the population as a whole. Life tables for insurance 
purposes are constructed by looking at insurance company data only. 

Purchasers of life annuities (discussed in Chapter 4) will on the average live even longer, 
since a person in poor health would be unlikely to buy such a product. Separate tables are 
needed for annuity purposes. 

Still another distinction to be made is the difference between individual contracts and 
group contracts. In the former case the buyers makes a definite decision to enter into the 
insurance or annuity contract and are presumably aware of their health conditions and acting 
in their best interests. In the latter cases, an employer purchases the contract to cover a large 
group of employees. Different mortality patterns in the two cases can be expected. 

There are many other examples which we will not discuss here, although some will be 
pointed out briefly in succeeding chapters. The reader should be aware that selecting an 
appropriate table for a particular use is an important actuarial task. 

Sometimes a very simple method, known as multiples of standard mortality, is employed 
to construct many different tables from a given one, known as the standard table. For example, 
it might be decided that for risks of a certain type, the mortality is 15096 of standard mortality. 
The life table for such risks is constructed by multiplying each q, in the standard table by 1.5. 
This method is justified more by simplicity of calculation rather than any scientific rationale. 


3.6 Standard notation and terminology 


We have already introduced the standard symbols „Pys nx €x» €, and c. 

The symbol ,|;4, denotes the probability that (x) will die between the ages of x+n 
and x + n + k, the quantity that we already have three ways of writing as shown in (3.5). A 
subscript of 1 is omitted, so that „|q, denotes ,,|,¢,. This use of a vertical bar is a typical 
actuarial device to denote a ‘waiting period’. In this case, the symbol is intended to indicate 
that the person will wait n years and then die in the following k years. 
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The quantity in (3.10) is denoted by e.m and the complete temporary life expectancy is 
denoted by ê 


x:mnr 


3.7 A sample table 


To do the spreadsheet problems in the following chapters that require a life table, we introduce 
a sample table. It is given by 


— &—0.00005(1.09)* = 
pan e ,  x=0,1,...,118, (3.11) 


1, x= 119. 


We have œ = 120. The formula is easily programmed into a spreadsheet, and that is the reason 
for giving the table in this form. Giving an existing table would necessitate entering the figures 
individually. Another advantage of this parametric form is that the two constants of 0.00005 
and 1.09 can be changed to provide a variety of different life tables for comparison purposes. 
Our table is, as stated, a sample table, and it should not be taken as being a realistic picture 
of modern-day mortality. This is true especially at the younger or very old ages, as will be 
discussed later in the text. Chapter 14 will provide some motivation for the formula; see, in 
particular, Exercise 14.11. 


Notes and references 


London (1997) provides an introduction to survival analysis, and gives more details on the 
construction of life tables. 


Exercises 


Type A exercises 
3.1 You are given that qe, = 0.20, dg; = 0.25, don = 0.25, qg4 = 0.30, qg4 = 0.40. 
(a) Find Z, for ages 60-65, beginning with ĉ6ọ = 1000. 
(b) Find the probabilities of the following: 
(i) (61) will die between the ages of 62 and 64. 
(ii) (62) will live to age 65. 
(c) Given that e¢; = 0.8, find e, for x = 60-64. 


3.2 You are given that spy. = 0.8, 10P45 = 0.6, ;9p55 = 0.4. Find the probability that (40) 
will die between the ages of 55 and 65. 


3.3 Suppose that out of a typical group of 100 people age 70, 10 will die in the first year, 15 
will die in the second year, and 20 will die in the third year. Calculate q79, q7;, 472 and 


3P70- 
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Type B exercises 


3.4 


3.5 


3.6 


3.7 


3.8 


Suppose that 
f = 100— x, x — 0,1, ...,100. 


Find expressions for (a) „Py, (b) „qy, (c) the probability that (x) will die between the ages 
ofx-4nand x4 n4 K. 


You are given that egy = 17, j9Psq = 0.8, and that the 10-year curtate temporary life 
expectancy at age 50 is 9.2. Find eso. 


Prove the following statement, assuming the approximation given in the text, and give 
an intuitive explanation: 


5 1 o 
e= zix t pl F e x41): 


Suppose that q, is equal to a constant q for all x. (Note that in this case œ = oo). Find 
expressions in terms of q and n for (a) ,p,, (b) e,. Do you think that this gives a realistic 
life table? Why or why not? 


In constructing a life table for heavy smokers, Actuary A decides to take a standard 
table, and double each value of q,, using a value of 1 if q, > 0.5. Actuary B takes the 
same table and squares each value of p,. Show that the resulting table of Actuary B, has 
lower mortality rates at all ages than that of Actuary A. 


Spreadsheet exercise 


3.9 


Taking the sample life table as given by (3.11), use recursion to find e, for x — 
0, 1, ..., € — 1. Focus on ey. How much is this reduced if the constant 0.00005 is changed 
to 0.00006. What happens if 0.00005 is kept the same, but 1.09 is changed to 1.092? 


Life annuities 


4.1 Introduction 


The financial losses that we listed at the beginning of Chapter 1 are no doubt familiar to all 
readers. Another type of risk, which may not be so obvious, is that of living too long. Given our 
definition of risk as the possibility of something bad happening, the reader may wonder about 
this statement. After all, is not living a long life a good rather than bad occurrence? It certainly 
can be, but it does carry with it the possibility of financial hardship if one does not have 
adequate income. Imagine the decision faced by a retired individual who has accumulated 
savings of 1 000 000, which is invested at an annual rate of 596, providing an annual income 
of 50 000. Imagine also that the individual decides that this is an insufficient return, so it 
is necessary to consume a portion of the capital each year, as well as the interest. Doing 
so, however, runs the risk that the entire capital may be depleted before death, leaving the 
individual with no source of income for his/her remaining lifetime. 

Life annuities are a means of insuring against such a risk. A life annuity is a contract 
between an insurance company, known as the insurer, and another party, known as the 
annuitant, which provides the following. In return for the payment by the annuitant of pre- 
scribed premiums, the insurer will provide a sequence of payments, known as annuity benefits 
of prescribed amounts and at prescribed times. The unique provision is that the annuitant 
must be alive to receive each benefit payment. These will terminate upon the death of the 
annuitant. 

People who purchase such a contract are investing capital that they have decided will 
not be needed for any dependents after they die. They agree to give up these funds upon 
their death in return for the greater yield they can achieve on their investment by sharing the 
amounts forfeited by those who predecease them. A good way to picture the workings of a 
life annuity is to imagine a room with a number of small boxes, belonging to one annuitant 
each. Each annuitant pays the same premium into his/her box, and interest earnings are added 
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to the amounts in his/her box. Each receives the same annuity benefits, which are provided by 
taking money out of the box. When one participant dies, his/her box is opened, and the amount 
is spread evenly among all the other boxes in the room. Participants then receive not only 
interest earnings, but also these forfeited amounts of those who die before them. We refer to 
the latter as survivorship earnings. This is a fairly accurate account of what actually happens, 
except that amounts are calculated on paper. There is obviously no need for the physical 
rooms or boxes. More precisely, it is an account of what would happen if mortality followed 
the life table exactly. For most annuities sold in practice, the accumulation is guaranteed 
in advance, so it does not depend on the actual deaths. If fewer people die than expected, 
the insurer would still have to add the extra survivorship amounts to people's ‘boxes’, and 
would experience a loss, while more deaths than expected would result in a gain to the 
insurer. 

To be equitable, all participants in any one room must have roughly the same risk of dying, 
and in particular must be of the same age. A 70-year-old, put in the same room as 20-year-olds, 
would be much more likely to die early on and would clearly be at a disadvantage. We will 
therefore calculate annuity premiums primarily as a function of age. There are of course many 
other factors to consider, such as gender, which we discuss briefly in Section 4.6. 

Insurance companies are not the only source of life annuities. Many pension plans pay 
retirement benefits in the form of a life annuity. 


4.2 Calculating annuity premiums 


We now turn to the mathematical aspects of annuities. The reader should review the vector 
notation given in Section 2.8, which we will make frequent use of. 

We assume that we have fixed throughout an appropriate life table and a discount function 
v to model the effect of investment earnings. 

Consider a contract sold to (x) — or, more accurately, on the life of (x), since the purchaser 
could be a different party than the annuitant, such as an employer. It is assumed throughout 
that (x) is an integer, which allows us to conveniently use a life table if desired. Indeed, 
the common practice is to classify an individual by their age on their last birthday, which 
is just the normal way that one thinks of age. We suppose that benefit payments are made 
yearly. Suppose that the amount of benefit paid at time k is c,. This could of course be zero, 
indicating that no payment is to be made. The contract is then described by the cash flow 
vector € = (Cp, Cj, ... , C, .,. 1). In this case we will refer to c as the annuity benefit vector. 
(Note that the final cash flow is made at age w — 1, which the last age at which anyone is living, 
according to our model.) Suppose that the benefits are to be purchased by a single premium 
paid at the beginning of the contract at age x. Our first problem is to calculate the premium 
that the annuitant should pay. The criterion is that the total premiums from all annuitants, 
together with the interest earned, should be sufficient to provide all of the required benefits, 
assuming that invested capital accumulates according to the given discount function and that 
mortality follows that of the given life table. 

Consider the simplest possible case, namely c = e. This is an annuity consisting of a 
single benefit payment of 1 to be made at time k if (x) is then alive. Such a contract is usually 
referred to as a pure endowment. Let E denote the single premium that should be paid to 
provide for this. Suppose that 7, people, each age x, buy such a contract. The total amount 
collected in premiums at time zero would be Z,E, and this will accumulate to Z,Ev (k, 0) at 
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time k. There will be Z,,, survivors at time k, and each of these must receive 1 unit. This 
means that 


€ ,Ew(k, 0) = l , 4. 
Solving, we obtain 


A 
E=wk x+k 
Y, 


x 


= y(k),p,. 


This can be verified intuitively. If the benefit of 1 unit at time k were guaranteed, we 
know from Chapter 2 that we would need to invest v(x). In this case, where the benefit is not 
guaranteed, we multiply by the probability of receiving it to arrive at the premium. 

The general case easily follows. Let à, (c) denote the single premium for a life annuity on 
(x) with benefit vector c. This contract can be viewed as a sequence of pure endowments, one 
for each value of k, where the kth pure endowment pays c, at time k. The premium for such a 
pure endowment is just c, v(k),p,, and the total premium is obtained by summation. We have 


@-x-1 


a()= Y, vOe 4. Dt 


k=0 


Life annuities have been classified into various types. An annuity for which benefit 
payments are made for a fixed number of years and then cease is known as a temporary life 
annuity (an example for those who collect oxymorons). 

The most common type of annuity is one for which benefits continue for as long as the 
annuitant lives. This is known as a whole life annuity. Purchasers of whole life annuities 
protect themselves completely against outliving their available capital, for no matter how long 
they live, the income from the annuity will continue. We have no need to distinguish between 
temporary and whole life annuities in our mathematical model. As indicated, we consider all 
annuities on (x) as running from time 0 to time œ — 1 — x. In the case of a temporary life 
annuity we simply take benefits equal to 0 after the last positive payment. In real life, however, 
there is a difference. The insurer can obviously not inform a person who has reached the age 
of œ, that according to the insurer’s model they are no longer alive and annuity payments will 
stop. This means that as a practical matter the benefit payments on a whole life annuity must 
eventually be constant (or at least follow some regular pattern) so it is clear what amount to 
continue to pay should an annuitant live beyond c. In most tables, œ is set sufficiently high 
that such an occurrence is extremely rare. 

Deferred annuities are contracts where the initial payment does not commence for several 
years. For example, a person wishes to provide for an income beginning at retirement. Math- 
ematically, this simply means that the initial entries in the benefit vector will be zero, so no 
special treatment is required. The annuity benefit vector will be of the form c = (0,, d), and 
the duration k is known as the deferred period. 


Example 4.1 A temporary life annuity on (50) provides for 1 payable at age 50, 2 payable at 
age 51, 3 payable at age 52, and 4 payable at age 53. Suppose qso = 0.1, G51 = 0.2, qs2 = 0.25. 
The interest rate is 50% for the first year and 100% thereafter. Find the single premium. 
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Solution. For a small problem such as this, it is convenient to make use of a time diagram as 
introduced in Chapter 2. See Figure 4.1. 


Benefits 1 2 3 4 
| | | | 
Time 0 1 2 3 
v's 2/3 1/2 1/2 
q's 0.1 0.2 0.25 


Figure 4.1 Example 4.1 


Here we have inserted values of q,, for relevant ages y, as well as the 1-year interest 
discount factors. We now must multiply each payment c, by the product of all previous values 
of v(i, i + 1) as we did in Chapter 2, and also by ,p, which we calculate as the product of all 
the previous values of p,,. The single premium equals 


2 2 1 2 1.1 
1 (2 z 9) ( ETUR 8) (4 22 e= KO OR. 35) = 3.28. 
+ X z inne + (3x z X CDM OW: aja la dal x075 3.28 


An alternate way of calculating the survival probabilities, which some may prefer, is to 
use (3.2). We can construct part of a life table, utilizing (3.7). In this example, starting with 
say £59 = 1000 we would get in turn dsy = 100, @5, = 900, ds, = 180,25; = 720, ds, = 
180, Z5, = 540. This automatically performs all the multiplication involving the values of p,,, 
and gives us the factors of 0.9, 0.72, 0.54, used above. 


4.3 The interest and survivorship discount function 


4.3.1 The basic definition 


It is important to note that the formula for @,(c) is the same as that given for à(c) in Chapter 2, 
but with v(k) replaced by v(K),p, for each value of k. This suggests that we are in the general 
situation of Chapter 2, but with a different discount function. This is indeed the case. Define 


y(n) = V(n)nPx 
and extend to a two-variable function 


y(n) 


k, = 
n E 


for all nonnegative integers k, and n. It is clear that y, satisfies (2.1) and therefore is a discount 
function. This fact is borne out by the discussion of the pure endowment in the previous 
section, which showed that the value at age x of 1 paid at age x + k, when accumulation is by 
interest and survivorship, is precisely y,(k). 
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Using the multiplication rule we can verify that for k < n 


ym = = = = VK My Pets (4.2) 
and fork > n 
iene. (4.3) 
k-nPx*n 
So for example, assuming constant interest 
i? 


30) )91T4.5. 0255 
3P x42 


Note that the last quantity, giving the amount at time 5 resulting from an investment of 1 at 
time 2, is greater than (1 + i)?, the amount resulting from interest only. This reflects the extra 
earnings due to survivorship. 

We will refer to y, as an interest and survivorship discount function. From (4.1) it follows 
that 


a,(€) = ac; y,), (4.4) 


and is therefore just the present value of the payments, with resect to the discount function y,. 
The significance of the above is that it shows that all the results of Chapter 2 can be carried 
over directly to life annuities. 

As a matter of terminology we will interchangeably refer to á,(c) and similar quantities 
introduced later, as net single premiums or present values. 

When calculating à,(c) by spreadsheets it is convenient to make use of the recursion 
formula (2.7) adapted to the interest and survivorship function. We have 


y, (k + 1) 2 y (kv k + lp, uu. (4.5) 


Notation Consider the vector 1,,_, which refers to a 1-unit cash flow sequence continuing for 
the life of (x). Due to the prevalence of this symbol, it is suppressed for notational convenience. 
This is a standard convention of actuarial notation that will be followed throughout the book. 
In general, whenever a vector is omitted, it is understood to be 1,,_,. For example, à, denotes 
the net single premium for a whole life annuity on (x), paying 1 per year for life, beginning at 
time 0. 

In cases where payments run for life, but are not necessarily constant, it is sometimes 
convenient to avoid specifying œ by using the symbol 1, to refer to a vector with entries 
of 1, running to age œ — 1 For example if œ = 120 we could write à49(2,5, 1,,) in place of 
ü49Q15; 165). 

In what follows we will keep v as the notation for the investment discount function, which 
reflects the discounting resulting from earnings on invested capital. We will use the letter y 
to denote a general discount function, which could be v or y, or any other such function. 
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Life annuity symbols will be written as given with the age as a subscript. For present values 
involving discounting at interest only, we will then incorporate the v and write a(c; v). 


4.3.0 Relations between y, for various values of x 


The reader should note that we have several interest and survivorship functions, one for each 
value of the initial age x. They are, however, related in the following way. 
From (4.2) 


y, ok(n) = y,(k, k +n) = vlk, k + n) Pyk = vok(n,p,,. 
This tells us that 
y, o k(calculated with v) = y,,;(calculated with v o k). (4.6) 


Therefore, if interest is constant, 


Yu = VrO k. (4.7) 


In particular, with constant interest y, = yọ ox, so that knowing the interest and survivorship 
function for initial age 0 determines it for all initial ages. 

In the remainder to this section, we continue with the notational ideas first introduced in 
Section 2.11, with the intention of providing a background for following classical actuarial 
formulas. It can be safely omitted at first, as everything can be conveniently expressed by 
using the new notation that we introduced in Chapter 2 and discussed in Section 2.14.2. We 
do however occasionally refer back to this section as well as Section 2.14.2 In particular the 
symbols v o k, and cok prove to be convenient and will be used in various places. 

In the first place, since we do not assume constant interest as was traditionally done, 
it is necessary to introduce a new notational device into the classical life annuity symbols 
in order to handle time shifting. The idea is to separate age from duration in the quantity 
x k. We let áj4,,(c) denote the present value of a life annuity on (x +k), paying c; at 
time i, and calculated with respect to the investment discount function v o k. Of course with 
constant interest, v o k = v, so that áp 4,4 (c) = à, (c) for all k, and this notation is not needed. 
(However even with constant interest we will have need of this notation with a more refined 
mortality model which we discuss in Chapter 9.) 

From (2.28) we can write 


üpj,4 (Co k) = Val Cei yx), (4.8) 


so the symbol on the left hand side is conveniently interpreted as the value at time k, of the 
benefits at and after time k, paid on a life annuity issued to (x) with benefit vector c. 
The above equation leads to the traditional form of the life annuity splitting identity. 


a,(€) E å (e) F y, G)àpqu4.CC o k). (4.9) 
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For a particular case of (4.9), assume constant interest and let c = 1,,. Then ,c = 1, and 
cok=1,,. Using (3.2) we can write 


" 
à, = à) + v* a NS (4.10) 
X 


This was a popular formula in pre-computer days. It was used to calculate values of temporary 
annuities à, (1,) from a table of values of whole life annuities and the life table. With modern 
spreadsheets, we do not need this or similar formulas for calculation purposes. They are 
however sometimes useful to explain relationships between various quantities. 

Formula (4.9) is often applied to express the present value of a deferred annuity à,(c) 
where c = 0, od, directly in terms of the vector d. Since ¿c = 0 and cok = d, we have 
immediately that 


à, (c) = y, (K)àp qu (d). (4.11) 


Example4.2 This example is designed to further illustrate the square bracket notation. Sup- 
pose that q6ọ = 0.05, go, = 0.10, gon = 0.15, iy = i; = 0.04 and i, = 0.08 for k > 2. Calcu- 
late ágo(14). @[59}41C14), @5g}42C14). 


Solution. 


ügg(14) = 1 + (1.04)710.95 + (1.04)-70.95)(0.90) 
+ (1.04)~7(1.08)~1(0.95)(90)(0.85) = 3.33 
Gysoj41 (14) = 1 + (1.04)~10.95 + (1.04) (1.08) (0.95)(0.90) 
+ (1.04)~!(1.08)~7(0.95)(90)(0.85) = 3.27. 


This could be interpreted as the amount that a person now age 59 would have to pay for a 
]-unit 4-year annuity if the purchase was made at the end of 1 year (assuming no change in 
the life table). It is lower than the first amount, since there is only 1 year of low interest rates 
rather than two. 


Gsgj4o(14) = 1 + (1.08) 10.95 + (1.08) 7(0.95)(0.90) 
+ (1.08)~3(0.95)(90)(0.85) = 3.19. 


This could be interpreted as the amount that a person now age 58 would have to pay for a 
]-unit 4-year annuity if the purchase was made at the end of 2 years. It is lower yet, since the 
buyer would avoid both years of low interest. 


4.4 Guaranteed payments 


Suppose an individual purchases a life annuity and dies immediately after paying the premium. 
The person will get nothing back, except the benefit payment at time zero if that is positive. 
While we have seen that this is fair, the situation is not always well understood by the 
dependents of the annuitant, who may well complain that the insurer has confiscated the funds. 
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Aside from this, many prospective purchasers are uneasy with the possibility that all or a large 
portion of their money could be lost. To provide additional flexibility, insurers frequently 
offer life annuities with a guaranteed period. A typical contract of this type would stipulate 
that for a certain duration (commonly 10 or 15 years) the benefits will be paid regardless of 
whether the annuitant is alive or not. After this guaranteed period the contract reverts to a life 
annuity and benefits are conditional on survival. The guaranteed period of course means that 
a higher premium must be paid for the same level of benefits. To calculate the premium, it is 
best to consider such an annuity as two separate contracts, one for the guaranteed payments, 
the other for the contingent payments, and add the respective premiums. Examples follow. 


Example 4.3 A person age 40 purchases a life annuity that provides 10 000 each year for 
life, with the first payment starting at age 41. The first 10 payments will be paid regardless of 
whether the annuitant is alive or not. Find a formula for the single premium. 


Solution. The premium for the guaranteed annuity is 
10000 ä(c; v), c-(0,1,9). 
The premium for the non-guaranteed annuity is 
10000 dao (f), f= (041, 1). 
Thus total premium is just the sum 
10000 [a(e; v) + à49(£)]. 


Let us verify the duration in the vector f. The first guaranteed payment is at time 1, so the last 
guaranteed payment is at time 10, and the first non-guaranteed payment is at time 11. As we 
start the indexing with 0, there will be 11 zeros in the non-guaranteed vector. 


Example 4.4 A person age 40 purchases a life annuity that provides 1 per year for life with 
the first payment at age 65. If (40) lives to age 65 he/she will receive at least 10 payments. 
Nothing is paid if death occurs before age 65. Find a formula for the premium. 


Solution. The premium for the non-guaranteed annuity is @49(035, 1,,). 

One must be careful with the ‘guaranteed’ portion here, which is deferred and therefore 
not completely guaranteed. The value of this at time 25 is Valy5(025, 1,9; v). To get the value 
at time 0, we multiply this by y49(25). It would be wrong to multiply by v(25), since these 
payments are not made if (40) dies before age 65, so that accumulation is at interest and 
survivorship for the first 25 years. The total premium is therefore 


ü49(055. La) + Y4q(25) Valos (025, 110; v). 


The more traditional way of writing this, incorporating our time shifting notation to handle 
non-constant interest is 


v(35)3sPaodq401435 + V25)55p49d(1 19; v o 25). 
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To handle the general case of guaranteed payments, let u denote a vector of benefit 
payments that are guaranteed provided the annuitant lives to time g. So in Example 4.3, g = 0 
and u = (1,9), whereas in Example 4.4, g = 25 and u = (05, 119). Clearly, we can assume 
that the first g entries are equal to 0 so that u is of the form (0,, w). Then, reasoning as above, 
the present value of the guaranteed payments is 


PAU; v) = v(g),p,à(w; vo g), 


which equals the present value with respect to the discount function v, multiplied by the 
probability that the payments will be made. 


See Exercise 4.15 for one variation on the guaranteed payment concept. We present more 
modifications in the next chapter. See Example 5.11 and Exercise 5.19. 


4.5 Deferred annuities with annual premiums 


Deferred annuities are often purchased by a series of annual premiums rather than a single 
premium. The premium payment period can be any length that does not exceed the deferred 
period, so that premiums stop when the benefit payments commence. Deferred annuities 
with annual premiums are frequently used to provide pensions. An individual, together with 
his/her employer, will pay premiums during his/her working years in order to provide income 
beginning at retirement. 

The sequence of annual-premium payments in the contract can be described by means of 
a premium pattern vector. This is a vector p with pg = 1. Then, p, denotes the ratio of the 
premium payable at time k to the premium payable at time 0. If we know the premium payable 
at time 0, often referred to as the initial premium, the premium pattern vector will determine 
all premiums. Namely, if m, denotes the premium payable at time k, 


Ty = AyPx: (4.12) 


The vector z = (a, 21, ...) will be called the premium vector. 

Suppose that (x) purchases a life annuity to begin at age x + n. The most common pattern in 
practice would be p — (1,), signifying a level premium payable until the income commences. 
However, many other patterns are encountered. Some annuitants may prefer to pay a higher 
premium but for a shorter period, adopting a pattern of (1,,) for some m « n. Others may 
prefer to pay a lower premium for an initial period, and then more at the end. For example, 
the pattern p = (1%, 2, ,.) would call for the premium to double after k years. 

The premium payments also constitute a life annuity on (x) since they will cease upon (x)'s 
death. (In this case they are paid by (x) rather than to (x).) To achieve the goal that premiums 
together with investment earnings are sufficient to provide the required annuity payments, the 
actuary will set premiums to be actuarially equivalent to benefits, with respect to the interest 
and survivorship discount function. 


Example 4.5 (Figure 4.2). An annuity on (40) provides for 1 annually for life, beginning 
at age 65. Nothing is paid for death before 65. Annual premiums are payable for 25 years 
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beginning at age 40. The premium reduces by one-half after 15 years. Find a formula for the 
initial premium, assuming c = 110. 


Benefits 0 0 0 ... 0 0 ... 0 1 d 2 

| | | | | | | | 
Time (0) 1 P EN 14 15 ... 24 25 26 ...69 
Premiums 70 700 m0... 700 mol... mo/2... 


Figure 4.2 Example 4.6 
Solution. Equating present values, we have 
ü49(c) = Gg9(2) = zoü4o(p), 
where 
€ = (095, 145), P = (045,0.549). 
Solving, we obtain 


- ü4o(c) 
ü4o(p) l 


4.6 Some practical considerations 


4.6.1 Gross premiums 


The annuity premiums, both single and annual, that we calculated in the previous sections are 
referred to as net premiums or benefit premiums. They are the premiums that are required to 
provide the benefits. The premiums that are actually charged in practice are known as gross 
premiums, and involve other factors in their calculation. The insurer must make provision for 
amounts to cover expenses and profits. In addition, a contingency charge is added to provide 
for adverse experience. After all, interest earnings may fall short of those predicted by the 
discount function or people may live longer than predicted by the given life table. Moreover, 
marketing considerations inevitably play a role. If an insurer wants to stay in operation it must 
ensure that the premiums it actually charges are competitive. The many details that go into 
the calculation of the actual gross premiums are beyond the scope of this book, although we 
will touch on this topic in Chapter 12. In any event, the initial basic step is to calculate the net 
premiums as done in the examples of this chapter. 


4.6.2 Gender aspects 


The role of gender in life annuities is a controversial issue. As noted above, the participants 
in a “female age x room’ can expect to live longer than those in the ‘male age x room’. The 
females will receive less money in forfeiture due to death, and therefore must invest more to 


STANDARD NOTATION AND TERMINOLOGY 57 


receive back the same amounts. If the insurer uses separate life tables for males and females, 
the result is that females must pay higher premiums to receive the same annuity benefits. This 
has in particular caused controversy in connection with defined contribution pension plans. 
In such a plan, the employee and employer make specified contributions into a fund and the 
accumulated amounts are then paid out as some form of life annuity when the employee retires. 
(This is as opposed to a defined benefit plan, where the annuity income is fixed as a function 
of length of service and salary.) If separate life tables are used, the same level of contributions 
will purchase smaller pension benefits for a female than for a male of the same age. This has 
resulted in charges of discrimination by gender. There have been lengthy debates on this issue 
and many points of view put forward. Some question the fairness of considering only gender 
while ignoring other factors. For example, a female smoker may actually expect to live a 
shorter period than a male nonsmoker of the same age, but will still receive a smaller pension 
if only gender is taken into account. The issue is complicated and we will not go into it further 
here. The trend lately has been to agree that the gender discrimination should be avoided, and 
many pension plans now use ‘unisex life tables’ that are a blend of the corresponding male and 
female tables. The result is that, for defined contribution plans, males receive somewhat lower 
pension benefits, and females receive somewhat higher pension benefits, than they would have 
received had separate male and female tables been used. 


4.7 Standard notation and terminology 


Standard notation for life annuities generally follows that for general annuities as introduced 
in Section 2.14, except that the age x is inserted as a right subscript, as we have done. So, for 
example, 
ay, qm-— à, (1,). yq = à,(0;, To 
Sem y(n) äp), Sym = y(n) tä, (0, L1); 
Ta,. m= à,(0, 1,2, .... n), Ta,.q = à,(1,2, ..., ny 
Da,.m = à,(0,n, (n — 1), (n — 2), .... 1), Da,.m = à,(n, (n — 1), (n —2),...,1); 


kl nde = k-1 |n = à, (0,. I 


The last item is the present value of a deferred annuity where the first payment is at time k, 
and there are n payments. In some cases, as here, a duration appears as a left subscript rather 
than under an 'angle'. To understand the second symbol, note that the unadorned a indicates 
a first payment at time 1, and the k — 1 before the deferred symbol indicates that payments 
are k — 1 years later than this, that is at time k, which is the same as the first symbol. 

The present value of a k-year pure endowment of 1 is denoted by ,E,, another example 
where duration is given by a left subscript. 

We have already introduced the symbol à,, in which the omitted duration symbol indicates 
a whole life annuity with payments of 1 for life. In the standard notation this carries over to 
the other symbols as well. That is, 


a,, ay, Ta, lay, UN = kae 


are all defined as the corresponding symbols above, with benefits continuing for as long as 
(x) is alive. In other words, n is taken to be œ — x — j, where j is the time of the first nonzero 
payment. 
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Traditional actuarial texts include many identities relating these quantities. For the most 
part, they all follow from the splitting identity (4.9). For example (assuming constant interest), 
we get the following variation of (4.10), written in standard notation as 


" MU k m 
De TER T ixt V apum 


The term annuity certain or fixed term annuity is often used to describe annuities where 
the payments are certain to be made, as distinguished from life annuities. 


4.8 Spreadsheet calculations 


In order to handle problems involving a life table, we proceed as follows. As in the Chapter 
2 spreadsheet we put duration in column A, and interest rates in Column B, starting at row 
10. We insert the life table in column N (columns in between are reserved for other purposes, 
described in later chapters ), with q, entered in cell N(10 + y). Our sample table can be input 
as follows. Enter the two parameters 0.0005 in cell N3, 1.09 in cell N4. This allows us to 
change parameters when desired. Then enter the formula 


= 1 — EXP(-N$3 x N$4^$A10) 


in cell N10 and copy down to cell N128. Enter 1 in cell N129. It is important to also ensure 
that the remaining cells in Column N are filled with zeros up to at least N250. 

In cell C1 we insert the age x. We then use the index function in Excel to select the 
pertinent mortality rates, putting q,,, in cell C (10+t) This is done by inserting the formula 


=INDEX(N$10 : N$250, C$1 + SA11) 


in cell C10 and copying down. Entries of zero will appear after duration œ — x coming from 
the extra zeros in Column N. 

We calculate v(k) in column D as in Spreadsheet 2, entering 1 in cell D10, = D10/(1+B10) 
in column D11 and copying down. 

We calculate the vector y, in column E, using the recursion (4.5). The entry in E10 is 1, 
the entry in E11 is 


= E10 x (1 + B10)^— 1 * (1— C10) 


and we copy down. 
We insert the cash flow vector c in column F, and calculate the value à,(c) in cell F8 with 
the formula. 


=SUMPRODUCT ($£10 : $E129, F10 : F129). 


For example at a constant interest rate of 696, ásg(159) = 11.5957. 

For annual-premium deferred annuities, we enter the premium pattern vector in Column 
H. We copy the formula in F8-HS8 (reserving column G for a later purpose). In I6, we enter 
= F8/H8, the initial premium. In I10, we enter 2$1$6*H10 and copy down, which inserts the 
premium vector in column I. Copying the formula in H8-I8 provides a check. We should get 
the same value as in F8. 
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As a check, at 6% interest, for an annuity on (50) providing 1 per life beginning at age 60, 
the level annual premium payable for 10 years is 0.855. 

We will leave it to the interested reader to modify this spreadsheet to handle guaranteed 
periods. 

Note that this spreadsheet can be used to handle short duration problems with the mortality 
rates given individually, such as we have in the examples and exercises. We can ignore the 
given age, and simply insert the values of q,,,; that we want directly in column C. Remember 
however not to save changes when closing, or alternatively, copy up in column C to restore 
the formulas when you next use the spreadsheet. 


Exercises 


Type A exercises 


4.1 


4.2 


4.3 


44 


4.5 


Redo Example 4.1, only suppose now that interest rates are 50% for the first 2 years 
and 100% after that. 


A group of individuals age 40, each invest 1000 in a fund earning interest at 596. At the 
end of 20 years the fund is divided equally among the survivors. If ;59p49 = 0.8, how 
much does each get? 


You are given ggg = 0.20, go, = 0.25, qg; = 0.40, qg3 = 0.50. The interest rate is a 
constant 5% for the first 2 years, and 7% after that. A 5-year life annuity on (60) 
provides for payments of 100(1 + k) at time k, where k = 0, 1,2,3,4. 


(a) Find the present value. 


(b) Suppose that instead of being a straight life annuity, the first three annuity payments 
are guaranteed regardless of whether (60) is alive or not. Find the present value 
now. 


You are given that interest is constant and that 


ü49(139) = 21, g9(1 19) = 6, v? ,0P49 = 0.8. 


Calculate G49(129). 


The present values of a 10-year life annuity on (60), and a 10-year compound inter- 
est annuity, with annual payments of 1, are given, respectively by ágo(1,9) = 5 and 
G(119;v) = 6. A person age 60 has 100 000 that he plans to pay as a single premium 
for a life annuity, beginning at age 60. This will provide a level income of 12 500 per 
year for life. What will the yearly income be if, instead of a straight life annuity, he 
purchases a life annuity with a 10-year guaranteed period? 


Type B exercises 


4.6 


Mortality is given by g5 = 0.1, q53 = 0.2. The investment discount function v is given 
by iy = 0.2, i; = 0.25, i, = 0.25, i, = 0.5. Calculate yso o 2(2) and ys55 (2), using v in 
both cases, to show that y, o k is not equal to y,,,; when interest is not constant. 


60 


4.7 


4.8 


4.9 


4.10 


4.12 
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If interest is constant and q, is a constant q for all x, find an expression for å, in terms 
of v and q. 


Show that, at a constant zero rate of interest, à,(0, 1,,) = e,. Give an intuitive explana- 
tion of this fact. 


A life annuity on (x) provides for annual benefit payments for life beginning at age 
x. The initial benefit payment is 1000 and each subsequent payment increases by 6% 
(i.e., the benefit payment at age x + 1 is 1060, the payment at age x + 2 is 1000(1.06)?, 
and so on). The first 10 benefit payments are guaranteed and will be made regardless 
of whether (x) is alive or not. The interest rate is a constant 6%. You are given that 
102, = 9.9 and e,.49 = 10. Find the present value of this contract. 


A life annuity contract on (80) has a present value of 3.14. The annuity benefits at both 
age 80 and 81 are 1. The interest rate in the first year is 25% and pgo = 0.8. Suppose 
that the value of pg, increases by 10% while all other mortality rates remain unchanged. 
What is the new present value of the contract? 


Given that 44) = 15, Gagj405 = 10, v(25) = 0.5, 25P40 = 0.4, find the net annual pre- 
mium, payable for 25 years beginning at age 40, for a deferred annuity, paying 1000 
yearly for life, with the first benefit payment at age 65. 


Suppose that for all x, q,,; = qy. Show that à,,, < åy. Does this remain true if we 
remove the monotone condition on q,? 


Spreadsheet exercises 


The following exercises are to be done using the sample life table of Section 3.7. 


4.13 


4.14 


4.15 


4.16 


A deferred life annuity on (40) provides for income of 10 000 per year beginning at 
age 65. Nothing is paid if death occurs before age 65. This is purchased by annual 
premiums payable for 25 years beginning at age 40. Premiums increase each year by 
10% of the initial premium, that is, z;, = (1 + 0.1k)zpo. Interest rates are 5% for the first 
10 years and 6% thereafter. Find the initial premium zp. 


A group of individuals age 30 each agree to invest 1000 per year for 40 years beginning 
at age 30. At time 40, the fund is divided up among all the survivors. If interest rates 
are 5% for the next 10 years, 6% for the following 10 years and 8% after that, how 
much does each survivor receive? 


A single-premium life annuity on (x) provides 1 unit per year for life, beginning at 
time 1, with a n-year guaranteed period, where n is the smallest integer greater than 
or equal to the premium. (This is known as an instalment refund annuity. It guarantees 
that at least the full amount of the premium, without interest, will be returned.) Assume 
a constant interest rate of 4%. Find n and the single premium if (a) x = 40, (b) x = 
70. (Note that there is no direct method of calculation. A trial and error procedure is 
called for). 


Interest rates are 8% for the first 25 years and 6% thereafter. Compare äj40]+20 and 
ü[sgj,10. Which one of these is smaller? Answer this before any calculation, and then 
do the actual calculation to verify your answer. 


Life insurance 


5.1 Introduction 


A life insurance policy is a contract between the insurer and another party known as the 
policyholder. In return for a payment of premiums, the insurer will pay a predetermined 
amount of money, known as a death benefit, upon the death of the policyholder. The amount 
of the benefit can vary with the time of death. In practice, this money will be paid immediately 
upon death (or more realistically, a short time after, to allow for processing the claim) but for 
mathematical convenience we assume in this chapter that it will be paid at the end of the year 
of death. For example, if the policyholder purchases a policy on January | and dies a week 
later, our assumption means that the death benefit will not be paid until December 31. We will 
consider the more realistic situation of payment at the moment of death in Chapter 8. 

The reader should distinguish carefully between life annuity and life insurance contracts. 
The life annuity provides a sequence of periodic benefit payments. The typical life insurance 
contract provides only a single benefit payment, paid on the occasion of death. 


5.2 Calculating life insurance premiums 


Consider a policy on (x). Let b, be the amount that will be paid at time k + 1 for death between 
time k and time k + 1. We will refer to the vector 


b = (bo, bi... bu v) 
as the death benefit vector. 
Notation The reader is cautioned that many authors use the subscript on b to refer to the 


time of payment. What we call 5,, they would call 5,., ;, since it is paid at the end of the year, 
which is time k + 1. We prefer the convention above. All our vectors are then indexed from 0 
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to œ — x — 1. In particular, this facilitates matters when dealing with contracts that combine 
annuity and insurance benefits. 


Suppose we have fixed an investment discount function v and a life table. We want to 
calculate the net single premium for the above policy, which we will denote by A,(b). (The A 
is a standard symbol that probably came from the word ‘assurance’, an older version of the 
word ‘insurance’.) The principle to determine this is the same as that used for annuities. The 
total premiums, together with all investment earnings, must be sufficient to provide all the 
death benefits. 

To derive the formulas, we will follow the annuity procedure and begin with the case 
where b = e“. This is a policy in which 1 is paid at time k + 1 providing (x) dies between 
the ages of x + k and x + k + 1. All of the other death benefits are of zero amount. Suppose 
that we have Z, people age x who each purchase this same contract. Out of these, there will 
be d, ,, individuals who die between the ages of x + k and x + k + 1, and each of them will 
receive 1 at time k + 1. The total present value of all these death benefits will be v(k + 1)d,,; 

This must be equal to the total amount collected in premiums, which will be Z,A, (e^). 
Therefore 


d 
A (e) = v(k 4- 1) a: 


x 


The general policy can be viewed as a sequence of 1-year policies as above, one for each 
value of k, where the kth policy pays b, at time k+ 1 for death in the previous year. The 
premium for such a 1-year policy is just b,v(k + 1)d,,,/¢, and the total premium is obtained 
by summation. We have 


@-x-1 d 
A(b- Y biv(k + D 
k=0 x 
@-x-1 
= Y be DGp,— apo (5.D$ 
k=0 
@-x-1 


Y, bot Dio Gere 
k=0 


The reader should note that the expression above follows the same pattern as the annuity 
single premium. It is a pattern that we will encounter many more times in subsequent material. 
Namely, we sum up a number of terms, each of which consists of three factors, 


amount x interest discount factor x probability that payment is made. (5.2) 


In the case of insurance, formulas [3.5(a)-(c)] give three different expressions for the 
probability that the payment will be made, giving us the three different ways of writing A,(b). 
Each will be useful in certain cases. 

As with annuities, the notation will suppress the death benefit vector 1,, = 1,,_,. Accord- 
ingly, A, will be the net single premium for a policy paying | at the end of the year of death, 
whenever it occurs. 
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Example 5.1 Suppose that qg; = 0.2, gg, = 0.4. qg; = 0.5 and i = 100%. A policy sold to 
(60) provides for benefits at the end of the year of death of 80 for death in the first year, 75 
for death in the second year, and 100 for death in the third year. If the insured lives to age 63, 
the policy terminates and nothing is paid. Find the net single premium. 


Benefits 80 75 100 
————————  —————- 
Time 0 1 2 3 
Pattern 1 2 0.75 

v's 1/2 1/2 1/2 

q's 0.2 0.4 0.5 


Figure 5.1 Example 5.1 


Solution. We again can make use of a time diagram as shown in Figure 5.1. The death benefit 
amount is put at the beginning of the year to which it is applicable. That is, b, is inserted 
above time k. Our convention here is to enclose these amounts in a box, to distinguish them 
from annuity benefits. The amount b, must be multiplied by the previous interest discount 
factors, as well as by the interest discount factor for the year starting at time k, since it is paid 
at the end of the year. Moreover it is multiplied by the previous p values and as well by q,4;, 
as indicated in the third formula in (5.1), which together give the probability of living to time 
k and then dying in the following year. In this case, the net single premium is 


(80x 5 x0.2) + (75x £x 0:8 04) + (100x = x 0.8 x 0.6 x 0.5) =17 


As with life annuities, one can as an alternative construct a partial life table, which 
automatically gives you the multiplication of the probabilities, and then use the first formula 
in (5.1). So for example in this case, starting with /6ọ = 1000, we have in turn, deg = 
200, “6, = 800, dg, = 320, %62 = 480, de; = 240. 


Normally, the policyholder does not pay for insurance by a single premium but rather by 
a sequence of periodic premiums. We assume in this chapter that premiums are paid annually. 
The premiums will be given by a premium vector z = (49,7, .... 7, , 1) as defined for 
deferred annuities in Section 4.5. Following the principle used there, we want the net single 
premium to be equal to the present value of the premiums, with respect to the interest and 
survivorship accumulation function. This means that 


A,(b) = a,(2) = zo dp), 


so that 


(5.3)+ 


where p is the premium pattern vector. 
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We follow the terminology used for single premiums and call these premiums net annual 
premiums or annual benefit premiums. We will later encounter examples of premiums that 
are different from net premiums, but our convention is that unless otherwise mentioned, all 
premiums will be net. 


Example 5.2 Suppose that the insurance in Example 5.1 is to be purchased by three annual 
premiums, beginning at age 60, where the second premium is double the first, and the third 
premium is three quarters of the first. Find the premiums. 


Solution. The premium pattern vector is given by p = (1, 2,0.75). We have 


ügg(p) = 1+ (2 x i x 0.8) + (0.75 x : x 0.8 x 0.6) — 1.89. 


so that the initial premium will be 17/1.89 — 8.99. The insured will then pay 8.99 in the first 
year, 17.99 in the second year and 6.75 in the third year. 


5.3 Types of life insurance 


Life insurance policies are traditionally classified into different types. Term insurance provides 
death benefits for a fixed number of years (similar to the temporary annuity). After the 
expiration of the term, coverage ceases and there are no more benefits. Example 5.1 involved 
such a policy, with a term of 3 years. Whole life insurance provides death benefits for life, so 
some payout on the policy is certain to occur. In our mathematical model we will not need 
to distinguish between the two types. We will assume that all policies will continue to age œ. 
For term insurance running for n years (which we refer to as n-year term) we simply will have 
b, = 0 for k > n. Nonetheless, in the next chapter we will see that there are differences in the 
nature of these two types and that whole life insurance has a savings component as well as an 
insurance component. Another common type of contract is endowment insurance, which we 
will discuss in detail in the next section. A more modern development, known as universal 
life will be described in Chapter 13. 


5.4 Combined insurance-annuity benefits 


It is possible to combine both life insurance benefits and life annuity benefits in the same 
contract. One of the most popular types of such a policy is known as endowment insurance. 
This provides for a payment at some future time n if (x) is then alive, and in addition a death 
benefit if (x) dies before time n. Such a policy is known as n-year endowment insurance 
or endowment insurance at age x +n, since it combines a pure endowment (as defined in 
Section 4.2 with insurance. It is usually marketed as a savings plan, whereby the policyholder, 
by paying premiums each year, will accumulate a certain sum at time n. In addition, the 
policyholder is protected with life insurance if he/she dies before accumulating the desired 
amount. 

It should be noted that a whole life policy is similar to endowment insurance since the 
policyholder is guaranteed some death benefit. Mathematically it can be viewed as endowment 
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insurance at age œ, and in practice it actually is interpreted in this way. Since it is to the 
advantage of the policyholder, the insurer will assume that everybody dies by age c, unlike 
the case of a life annuity, and pay the death benefit at age @ to all survivors. 

To calculate the single premium for endowment insurances, we simply view it as two 
separate contracts and add the premiums. 


Example 5.3 An insurance policy on (x) provides 1 unit payable at time n if (x) is then 
alive, plus 1 unit payable at the end of the year of death if (x) dies before time n. Level annual 
premiums of P are payable for n years. Find a formula for P. 


Solution. The single premium for the death benefit is A,(1,). The single premium for the 
pure endowment is y, (rt). The premium pattern vector p is (1,). The total single premium for 
the contract is then A,(1,,) + y, (1) and 


An) + y(n) 
i al,  ' 


Example 5.3 is a very common type of endowment insurance with a level death benefit 
equal to the pure endowment amount, and level premiums payable for the full term. This is 
not essential, however, and many other combinations are possible. 


Example 5.4 Consider a 20-year endowment insurance with the death benefit equal to 
] unit for the first 10 years, and 2 units for the second 10 years. The amount of the pure 
endowment is 3 units. Level annual premiums of P are payable for 15 years. Find a formula 
for P. 


Solution. Calculating as in the previous example, 


pu A,(1j9, 210) + 3y,20) 
a, (145) 


Example5.5 (Figure 5.2). Suppose that ggg = 0.2, dg; = 0.4 and i = 100%. A 2-year policy 
provides for benefits at the end of the year of death of 80 for death in the first year, and 75 
for death in the second year. In addition, there is a pure endowment of 70 paid at age 62 if 
the policyholder is then alive. This is purchased by two-level annual premiums. Calculate the 
premium. 


Benefits 80 75 70 
| | | 
I T I 
Time 0 1 2 
Pattern 1 1 
v's 1/2 1/2 
q's 0.2 0.4 


Figure 5.2. Example 5.5 
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Solution. The present value of death benefits is 
Ago(80, 75) = (so x ; x 0.2) + (75 x 1 x 0.8 x 04) - 14. 

The present value of the pure endowment is 

ügo(0,0, 70) = 70 x 1 x 0.8 x 0.6 = 8.4. 
The present value for the premium pattern vector is 

ügg(p) = ágg(1, 1) = 1 + ; x0.8214 
By (5.3) 
. 14-84 _ 


T=, = i4 — 16. 


Another common type of combined policy is a deferred annuity that provides for death 
benefits during the deferred period. This is somewhat similar in nature to endowment insur- 
ance. The difference is that the accumulated savings are paid out as an annuity rather than as 
a single payment. 


Example 5.6 (Figure 5.3). A contract on (40) provides for annuity benefits of 1 per year 
for life, beginning at age 65. If (40) dies before age 65, a death benefit of 10 will be paid at 
the end of the year of death. Level annual premiums of P are payable for 25 years. Find a 
formula for P. 


Benefits 10 10 10 — 10 1 1 
| | | | | | 
T [ I [ I I 
Time 0 1 2 24 25 26 
Pattern 1 1 1 1 


Figure 5.3 Example 5.6 


Solution. We calculate this, as in the examples above, by adding the single premiums for the 
death benefits and the annuity benefits and dividing by the annuity for the premium pattern 
vector. The result is 


ü49(155) 


P 


The second term in the numerator can be written alternatively as v(25)55p40dq49]425. 


An interesting variation on the above involves a popular marketing device, which is to 
issue policies with a return-of-premium feature. 
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Example 5.7 (Figure 5.4). A contract on (40) provides for annuity payments of 1 per year 
for life beginning at age 65. Should (40) die before age 65, there will be a return of all 
premiums (without interest) paid prior to death. Level annual premiums of P are paid for 25 
years. Find a formula for P. 


Benefits P 2P 3P "s 25P 1 1 
| | | | | | 
i [ I I I I 
Time 0 1 2 Ms 24 25 26 
Pattern 1 1 1 as 1 


Figure 5.4 Example 5.7 


Solution. This is basically the same type of problem as above, except that the benefits depend 
on the unknown P. We cannot obtain P directly, but we can set up an equation to solve for 
it. Suppose (x) dies between time k and k + 1, where k is between 0 and 24 inclusive. The 
insured will have paid k + 1 premiums of P and these will be returned at time k + 1. The death 
benefit vector is then 


b = (P,2P,3P, ...,25P) = Pj, 
where j = (1, 2,... ,25). Equating present values of the premiums and benefits, 


Pá,49(155) = PA49)) + d49(055. L), 
from which 


G49 (05, Lo 
ze 40(025 1s l (5.4) 
ä40(l25) — A40) 
Example 5.8 Consider the same policy as above, except that the premiums are returned 
with interest as determined by the function v. Find a formula for P. 


Solution. One method is to simply use (5.4), where the vector j is changed from j, = k + 1 
to 


Jg = Valk41 (1k41; v). 


Another approach, which produces the solution in a much easier form for calculation is 
found through the following reasoning. In the first 25 years the accumulation is according to 
the discount function v rather than y,;. To see this clearly, consider the room-box description 
of Chapter 4. Upon death of a participant before age 65, the person’s box is opened up, but 
the amount paid back as a death benefit is exactly the same as the amount that was there in 
the first place. The other participants will receive no survivorship earnings. Therefore, each 
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survivor will have accumulated at age 65 the amount P Valj5(125;v). This amount must be 
sufficient to provide an annuity of 1 unit per life beginning at age 65. Thus 


|. 4aoj425 
Val;s(155; y) : 


We will present an alternative solution to this problem in Section 6.7. 
A generalization of the above is concerned with partial refunds. 


Example 5.9 Now consider Example 5.8, except that the payment upon death is k times 
the accumulated premiums with interest, where 0 < k < 1. 


For each premium P, the portion KP accumulates at interest and (1 — k)P accumulates 
with interest and survivorship. Equating values at time 25, 


k P Vabs(l5s, v) + (1 — k) P Vabs(15s: Yao) = üq40]425 
and we can solve to get 


PN [401425 
k Valy5(1o5; v) + (1 = k) Valy5(153 Y40) i 


For k = 1 this reduces to the solution to Example 5.8 and for k = 0, it is a special case of 
Example 4.5. 


Remark For those preferring an alternative to the square bracket notation, there are var- 
ious other ways in which to write the numerator of the last two examples. For example, 
Valy5(7° 1,53 y49) OF ä40(025, 1,)/¥49(25). Of course with constant interest, the simplest choice 
is just ds. 


The following variation is suggested by provisions in some pension plans where an 
employee must remain in the plan for a minimum period in order to get credit for premiums 
paid by the employer. A more realistic version will be given in Section 13.3. 


Example 5.10 Redo Example 5.8 only with the provision that premiums are returned with 
interest upon death only if (40) lives to age 45. Nothing is returned if death occurs in the first 
5 years. 


Solution. There does not appear to be any convenient way to apply the simplifying approach 
of the last two examples. We can however use (5.4) with the vector j given by 


fo if0 € k « 5, 
TEV Vales (ger) if < k< 25. 


Indeed, (5.4) is a flexible formula which can be adapted to a variety of premium return 
provisions, including cases where the interest rate on the premiums may differ from that used 
in the discount function. 
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We now present an example involving a more complex calculation. 


Example 5.11 An annuity on (x), purchased by a single premium S, provides for 1 per 
year for life, beginning at time 1. If (x) dies before a total income of S has been paid out, the 
difference between S and the total income received will be refunded at the end of the year of 
death, so that the policyholder will at least receive total income equal to the single premium. 
(e.g., if S is 20 and (x) dies between time 5 and time 6, a death benefit of 15 would be paid at 
time 6.) Describe a procedure for calculating S. 


Solution. The premium S must satisfy 
S = 4,(0,1,,)+A,(b), where b, = max(S — k, O}. 


Since b depends on S, there is no direct way to solve this and an iterative numerical procedure 
must be employed. One guesses at an initial value of S (it will be close to but greater than 
à, (0, 1,,)), and then continues to adjust the value of S until the right hand side above also 
equals this value. This can be carried out automatically in Excel with the goal-seek function. 
Exercise 5.20 gives some particular examples. 


The contract in Example 5.9 is usually termed a cash refund annuity and it is a variation on 
the instalment refund idea introduced in Exercise 4.15. In practice, both of these are somewhat 
more complicated than described since the refund is based on the gross premium rather than 
the net. 


5.5 Insurances viewed as annuities 


A comparison of (4.1) and the third expression in (5.1) shows that 
A,(b) = à,(c) where c, = v(k, k + Db,q,,. (5.5)t 


Let w, denote the vector with entries (w,), = v(k, k + 1)g,,4. We can then write (5.5) in 
compact form as 


A,(b) = a,(w, * b). (5.6) 


We can verify this intuitively as follows. Suppose (x) wishes to pay a single premium 
for a life insurance policy running over several years, based on the death benefit vector b. 
The insurer refuses, claiming that it only sells life insurance policies for 1 year at a time. It 
does, however, sell life annuities. The person therefore purchases a life annuity with benefits 
of v(k, k + 1)byq,4,; at time k if she is alive. She does not keep this annuity payment, but 
immediately returns it to the insurer to purchase a 1-year life insurance policy paying a death 
benefit at time k + 1 if death occurs in the next year. This single premium will purchase a 
death benefit of exactly b,. This follows from our previous discussion, but to re-emphasize 
the point, note that this premium will accumulate to b,d,,;/@,4, at the end of the year, and 
assuming l, people engage in this scheme, this amount collected from each of f +g survivors 
at time k will be enough to provide b, to each of the d,,,; people who die during the year. 
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This illustration is of course fanciful, as all insurers sell policies for periods of more than 1 
year. (Moreover, as a practical matter it overlooks the fact that 1-year premiums could change 
over time, as well as the fact that annuity and insurance premiums are based on different 
tables.) It does, however, provide a useful point of view. Each policyholder can look upon a 
life insurance policy as a life annuity, providing what is essentially that person’s share of the 
death benefits. One could suppose that these annuity payments are then collected in a separate 
fund and used to pay all the benefits to those who die during the year. This is often a valuable 
way of looking at the situation, since all policies can be thought of as life annuities if we 
wish. The calculation of premiums and other quantities can then be reduced to the general 
principles outlined in Chapter 2. 

This viewpoint allows us to immediately adapt all results obtained for life annuities to the 
insurance setting. For example, the splitting identity for life annuities (4.8) takes the following 
form for insurances: 


A,(€) = A GKE) + y, (OA yy (C04), (5.7) 


where the bracket on x indicates, as with annuities, that the discount function used is vo k. 
We can also maintain the room-box visualization for life insurance policies, which we 
introduced for annuities in the previous chapter, and which will provide a useful guide in the 
next chapter. 
The reader is cautioned that while (5.6) is useful conceptually and also for spreadsheet 
calculation (as illustrated in Section 5.9), it is not always the best for hand calculation. For 
such purposes, the procedure used in Example 5.1 is usually less subject to arithmetical errors. 


5.6 Summary of formulas 


In this section we summarize the procedure, developed in the last two chapters, for calculating 
an annual premium on a general life insurance—annuity contract. We first identify four pertinent 
vectors: b, the death benefit vector; c, the life annuity benefit vector; u = (0,, w), the vector 
of payments that are guaranteed provided (x) lives to age x + g; and p, the premium pattern 
vector. In many cases only one or two of the first three vectors will be applicable, and the 
others will be set equal to the zero vector. We then calculate the initial premium from the 
equation 


zoü.(p) = A,(b) + ü,(c) + V(8) oP ACW; y). 


This allows for cases where the vectors on the right hand side can themselves depend on zp. 


5.7 Ageneralinsurance-annuity identity 


5.7.1 The general identity 


There is another useful relationship between insurances and annuities. 
For the vector w, introduced in Section 5.5, we can write 


(wy = WK, k + 1) — v, k Dp = 1 — v(k, k + pra — dy 


A GENERAL INSURANCE-ANNUITY IDENTITY 71 


and it follows that 


in the notation of (2.14) with respect to the discount function y,. From (5.6) 
A,(b) = a,(w, * b) = 4,(V1,,* b) — a,(d « b) 
and from (2.15) we obtain our main identity 


A,(b) = à, (Ab) — ä,(d * b). (5.8) 


5.7.2 The endowment identity 


We will use (5.8) to derive a well-known actuarial formula, which we call the endowment 
identity. Assume constant interest. The constant discount rate d then factors out as a constant 
multiple and à,(d * b) = da,(b). Suppose b = (1,). Then Ab = (1,0,0,...,—1) where the 
—] is in position indexed with n (i.e., in the (n + 1)th entry since we start with 0). Equation 
(5.8) says that 


A (1) = 1 — y(n) — da,(1,). 


Let A,.,, be the net single premium for an n-year, 1-unit endowment insurance. That is, 1 is 
paid either at time n, or at the end of the year of death if that occurs before time n. (This is 
the standard symbol for such a premium.) Adding y, (n) to both sides of this equation, we get 
the endowment identity, 


Ay.m= 1 -da,(1,). (5.9) 


This identity is analogous to (2.16) and its derivation as given at the end of Section 2.9. 
For an interpretation, suppose I lend you 1 unit now, to be repaid in full at the end of n years, 
or at the end of the year of your death if this occurs before n years. You must also pay interest 
at the beginning of each year until the loan is paid. The present value of the loan must be 
equal to the present value of the principal repayments, plus the present value of the interest. 
The latter is just a temporary life annuity paying d units yearly for n years, beginning at time 
zero. The present value of the loan is just 1 and the present value of the principal repayments 
is just A,.,,, the net single premium for the endowment insurance. We obtain the equation 
1 = Ay. dà,(1,), which gives (5.9). 

If P denotes the level annual premium payable for n years for the 1-unit, n-year endow- 
ment insurance, we can also express P in term of annuities by dividing by a(1,,) in (5.9) to 
obtain 


1 
= — —- (5.10) 
a,(,,) 
This identity can be interpreted as follows. Suppose you invest 1 unit. This will provide you 
with interest earnings of d at the beginning of each year, until such time as you wish to 
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terminate the investment and take back your principal of 1. An alternate scheme is to use the 
1 unit to purchase an n-year life annuity, paying 1/à,(1,) at the beginning of each year for 
n years, and to use part of these proceeds to purchase a 1-unit, n-year endowment insurance, 
carrying level annual premiums of P for n years. The insurance will pay you back your 
principal at the end of n years, or at death if earlier. Before recovering your principal you will 
have net annual earnings of 1/à,(1,) — P, and this must equal the income of d that you would 
get from the first alternative. This yields the given identity. 


Remark This back-to-back annuity-insurance combination has recently become popular 
as an investment vehicle. At first glance, it appears that in practice it will produce a lower 
return than the straight investment, since one must pay expenses on both policies and, in 
addition, the different life tables used for annuity and insurances will work to the purchaser's 
disadvantage. (Our simplified model assumes no expenses and that the life table is the same in 
all cases.) However, the fact that the proceeds of the insurance and annuity contracts receive 
favourable tax treatment in many jurisdictions often means that the net after-tax return can 
actually be higher than that of the straight investment. 


The above formulas are of course true for a 1-unit whole life insurance, which is just 
endowment insurance at age w. We have 


x= l- dá, pecu, (5.11) 


where P, is the net level annual premium payable for life for a 1-unit whole life contract 
on (x). 


5.8 Standard notation and terminology 


5.8. Single-premium notation 


The standard symbol for an insurance single premium is A, as we have given it. As with 
annuities, this is embellished with superscripts and subscripts to handle the common types of 
death benefit vectors. For example: 


* A, denotes A,(1,,) as we have already indicated 


° Al, denotes A,(1,,), the net single premium for a l-unit, n-year term insurance. The 
superscript 1 above the x signifies that (x) must die before the expiration of the n-year 
period in order to collect. 


e Am denotes A,(1,) + y(n), the net single premium for a l-unit n-year endowment 
insurance, as we have indicated above. The subscript x : nl signifies that the death 
benefit is paid upon the first ‘failure’ of (x) or the n-year period. The life (x) fails upon 
death and the n-year period fails at the end of n years. Recall that the same subscript is 
used in the temporary annuity standard symbol to signify that benefits are paid as long 
as both the life (x) and the n-year period are 'surviving'. 
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* A. a is another symbol for „E, (denoted by y, (n) in our notation). This is the net single 
premium for a l-unit, n-year pure endowment. The superscript 1 is now is above the m 
to signify that the n-year period must fail before (x) does in order for the contract to 


pay. 


* ,|,A, denotes A,(0,, 1,). This is deferred insurance. A level death benefit of 1 unit 
begins after k years and continues for n years. Such a policy would not normally be 
sold by itself but may be combined with other policies. 


e ,|A, stands for ;|,,-~A, in accordance with the usual practice of omitting duration 
symbols when the contract continues for life. 


° DA! a denotes A,(n,n — 1,...,1). The D stands for decreasing. 
° IAL a denotes A,(1, 2, ...,n). The / stands for increasing. 


e DA, and JA, are respectively the above two symbols with n = c — x, in keeping with 
the general notational principle discussed in Section 4.7. 


5.8.2 Annual-premium notation 


The standard symbol for a level net annual premium is P. This is followed by the single- 
premium symbol. For those single premiums that begin with a capital A, the A is omitted. The 
premium payment duration t appears before the P on the lower left. If omitted, it means that 
premiums are paid for the natural duration of the contract. Examples follow: 


e P. is the annual premium, payable for life, for a 1-unit whole life policy on (x). 
e ,P.is the annual premium payable for t years for a 1-unit whole life policy on (x). 


° P! 7 is the annual premium payable for n years for a l-unit, n-year term policy on (x). 


e ,P,. is the annual premium payable for t years for a 1-unit n-year endowment insurance 
on (x). 


e P (AL a) is the annual premium payable for n years for an n-year increasing term 
policy on (x). (In this case we insert the full single-premium symbol since it does not 
begin with an A.) 


e P(,,|d,.) is the level annual premium payable for n years, for a deferred annuity providing 
income of 1 unit per year beginning at age x + n. Note here that the missing premium 
payment duration symbol is taken as n, the natural premium payment duration. Although 
the contract continues for life, it is not natural to continue paying premiums when the 
annuity payments begin. 


All of these annual premium symbols are evaluated by taking the corresponding net single 
premium and dividing by àá,.;, where ft is the premium paying duration. For example, 


P _ Aso20 
104 5020 = 7 . 
d50:1 
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5.8.3 Identities 


The same types of identity that we discussed in Section 4.7 arise with insurances as well. For 
example, 


Axm = Ata T VK) PAA (xy 4k: n-kt 


The derivation will be left to the reader. 


5.9 Spreadsheet applications 


We modify the Chapter 4 Spreadsheet to handle death benefits. The death benefit vector is 
entered into Column G. The vector w, is calculated in column L by putting the following 
formula in L10 and copying down 


= G10 x (1 + B10)*(-1) « CIO 


In G8 we then put the same formula as in F8 except with L replacing F, and this will return 
A,(b). 


The entry in I6 is changed to =(F8+G8)/H8. 
The premium vector appears in Column I. 


For a check, the formula in H8 can be copied to I8 and this cell should return a total of F8 
and G8. 

For a sample problem, compute the initial premium for a 1000-unit, 30-year endowment 
policy on (40) with premiums payable for 20 years, and the premium to double after 10 years. 
The interest rate is a constant 6%. The answer is 12.68. 


Exercises 


Type A exercises 


5.1 Given that 77) = 1000, 77, = 960, 77, = 912, and that interest rates are a constant 
10%, calculate A79(14). 


5.2. A 3-year endowment insurance policy on (60) provides for benefits paid at the end of 
the year of death of: 500 if death occurs in the first year (i.e., between time 0 and time 
1); 800 if death occurs in the second year; and 1000 if death occurs in the third year. 
In addition, there is a pure endowment of 1000 payable at age 63 if (60) is then alive. 
This is purchased by three annual premiums beginning at age 60. The second premium 
is double the initial premium and the third premium is three times the initial premium. 
You are given that that gg = 0.1, ¢6, = 0.2 and gg, = 0.25. The interest rate is 25% 
for the first 2 years and 20% after that. Find the initial premium. 


5.3 A 2-year term insurance policy on (60) provides for a death benefit of 100 payable at 
the end of the year of death. This is purchased by a single premium. If (60) lives to age 
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62, the single premium is returned without interest. Given that ggg = 0.1, qg, = 0.15, 
and interest is a constant 10%, find the single premium. 


You are given q, = 0.053, q,,, = 0.054, ¢,45 = 0.055, ig = 0.06, i; = 0.08, i, = 0.10. 
If b = (1, 2, 3), find the vector c = b * w,. 


Type B exercises 


5.5 


5.6 


5.7 


5.8 


5.9 


A deferred life annuity on (40) provides for a yearly income of 1000 beginning at age 
65 and continuing for life. It is to be purchased by a single premium of S payable at 
age 40. If death occurs during the deferred period (i.e., during the first 25 years), then 
the single premium is refunded without interest at the end of the year of death. 


(a) Give a formula for S using the symbols à and A. 


(b) You are now given the following information. The same annuity contract without 
the premium-refund feature (i.e., nothing is paid during the deferred period) can be 
purchased for a single premium of 2000. In addition, a contract that provides the 
same annuity benefits, plus a level death benefit of 2000, payable at the end of the 
year of death, for death during the deferred period, can be purchased for a single 
premium of 2200. Calculate an exact numerical value for S. 


A certain electrical appliance is sold with a 5-year guarantee. This provides that the full 
purchase price is refunded if the product fails within 2 years, and half of the purchase 
price is refunded if the product fails in the following 3 years. A study shows that out 
of a typical batch of 100 items, there will be 2 failures in the first year, 3 failures in the 
second year and 4 failures per year after that. Assuming that interest is a constant 5% 
and that reimbursement is made at the end of the year of failure, what is the cost of this 
guarantee to the manufacturer, as a percentage of the purchase price? 


What is A,(1,,) if the constant interest rate i = 0? Give both a formal derivation, and a 
proof by general reasoning. 


Suppose that interest is constant and q, is a constant q for all y. Find an expression for 
A, in terms of q and y. 


An actuary calculates a single premium for a certain life insurance policy on (40), 
and then discovers there were two errors made. In the first place, the life table used 
showed a value of q4ọ that was only one-half of what the correct figure was. Second, the 
first-year death benefit was taken as 20 when it should have been 10. Will the correct 
premium be the same as, lower, or higher than the one calculated? 


For a certain insurance contract, on (50), the death benefit for the first year of the 
contract is 1100, payable at the end of the year of death. The single premium for the 
whole contract is 600. This is based on an interest rate of 10% for the first year and a 
mortality table with q5ọ = 0.20. If the value of qsọ is changed to 0.25, while all other 
value of q, are unchanged, what is the new single premium? 


There is a constant interest rate of 2096, and 


Aso = 0.300, — v!?iopPso 20.10, — Aq420400, — ggg = 020. 
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Suppose that q«g is changed to 0.23, while all other values of q, remain unchanged. 
What is the new value of A55? 


The cash flow vector j = (1,2,3, ..., 10). You are given that interest is constant and 
that 
a,(j)=30, ^ A,(1;9) = 0.10, à,(149) ^ 7, yo op, = 0.48. 
Find A,(j). 
Consider a whole life policy on (x) with a level death benefit of 1. Suppose that, for 


some age y > x, the value of q, is increased while all other values of q remain the same. 
(a) Show that A, is increased. 


(b) Show that P, is increased, where P, is the level annual premium payable for life, 
for this policy. 


(c) Show by example that the above statements are not necessarily true for a whole life 
policy with a non-constant death benefit. 


(a) Suppose that, for all x, q,,; = q,. Show that A,,, > A,. 

(b) Does the above remain true if we remove the monotone condition on q,? 
Show that for the vector c given by formula (5.5), and k = 0,1,2..., we have 
(a) ü,(,e) = A, (xb); 

(b) áp, (cok) = Ap, 4 (Dok). 


A deferred annuity on (40) provides for an income of 1000 per year for life beginning 
at age 60. If (40) lives to age 60, the first 10 annuity payments are guaranteed regardless 
of whether (40) is alive or not. Level annual premiums of P are payable for 10 years, 
beginning at age 40. If (40) dies before the annuity begins, all premiums paid prior 
to death are returned at the end of the year of death, and no annuity payments are 
made. Find a formula for P, assuming: (a) premiums are returned without interest; (b) 
premiums are returned with interest. 


One could generalize our definition of an insurance contract by stipulating that for 
death at time f, the death benefit is paid at time z(t) which is some function of t. (e.g., 
when benefits are payable at the end of the year of death, 7(f) = [t] + 1, where [-] is the 
greatest integer function.) Show that a pure endowment contract can be considered as 
an insurance, in this sense. 


An annuity on (x) provides 1 per year for life beginning at time 0, with the further 
provision that, n additional payments will be made after (x) dies, beginning at the 
end of the year of death. Interest is constant. Show that the net single premium is 
à(1l,; v) + v"à,. Derive this formula in two ways. (a) By using (5.11). (b) By using (5.2). 


Redo Example 5.8 only now assuming that for death during the first 5 years, one-half 
of all premiums paid prior to death are returned at the end of the year of death. 
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A contract on (70) provides for a payment of 1000 at the end of 3 years if (70) is then 
alive, and is to be paid for by three-level annual premiums. For death in the first year, 
nothing is paid. For death in the 2nd or 3rd years, all premiums paid are returned with 
interest at the end of the year of death. You are given that g7) = 0.1, q;; = 0.2, g73 = 0.3 
and i = 25%. Find the annual premium. 


Provide an algebraic proof that the two methods given for the solution of Example 5.8 
produce the same answer. (Hint: Change the order of summation in a double sum.) 


Provide an algebraic proof and give an intuitive explanation of the following identity. 
For any vector € = (cg, C1; ..., Cn 


a,(c) = v(),p, Val, (e; v) + AQ) 


where j, = Valz} (€; v), k = 0,1,...,n — 1 and ,c is as defined before Equation 2.18. 


Spreadsheet exercises 


The following exercises are to be done using the sample life table of Section 3.7. 


5.23 


A contract on (40) provides for death benefits if death occurs in the next 25 years. The 
amount of the death benefit is 50 000 for the first 10 years and 100 000 for the next 
15 years. If (40) lives to age 65, he/she will receive a life annuity of 10 000 per year 
for life beginning at age 65. Premiums are payable for 15 years beginning at age 40. 
The premium doubles after 5 years. Interest rates are 5% for the first 20 years and 6% 
thereafter. Find the initial premium. 


Interest rates are a constant 696. A contract on (50) provides for a payment of 10 000 at 
age 70 if then alive. Level premiums of P are paid only at even-numbered times, that 
is, at age 50, 52, 54, .... If (50) dies before age 70 there is a return at the end of the 
year of death of all premiums paid prior to death. Find P. 


Assume a constant interest rate of 4%. Find the premium for the annuity in Example 
5.9 if: (a) x = 40 ;(b) x = 70. 


Insurance and annuity reserves 


6.1 Introduction to reserves 


Given an insurance or annuity contract and a duration k, the reserve at time k is defined exactly 
as in Definition 2.7. It is the amount that the insurer needs at time k in order to ensure that 
obligations under the contract can be met. Calculating reserves for each policy is an important 
responsibility of the actuary, known as valuation. The insurer wants to be confident that funds 
on hand, together with future premiums and investment earnings, are sufficient to pay the 
promised future benefits. It is important to thoroughly master the concept of insurance and 
annuity reserves in order to properly understand and analyze the nature of these contracts. 

Throughout this chapter we will deal with the following model. As usual we start with a 
fixed investment discount function v and a life table. We have a contract issued on (x) with 
death benefit vector b, annuity benefit vector c and premium vector z. (For simplicity we 
will omit the possibility of guaranteed payments in our discussion, but this feature can easily 
be incorporated if desired.) Recall from Section 5.5 that we can view the death benefits as a 
vector of annuity benefits b * w,, where (w,), = v(k, k + 1)q,4;. We can then form the net 
cash flow vector 


f-z-b*w,-c, (6.1) 


which indeed represents the net cash flow on the contract from the insurer's viewpoint. The 
insurer will collect premiums of z and pay out death benefits in the form of b * w, and annuity 
benefits of c. The reserve at time k on the contract is the reserve for the vector f with respect 
to the interest and survivorship discount function, as given in Definition 2.6. Denoting the 
reserve by ,V, we have 


KV = KV (Ë; yy) = -Val, (‘fs yx) . (6.2)t 
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Itis often useful to write this in an alternate way that keeps benefits and premiums separate. 
That is, we note that the reserve at time k is the value at time k of future benefits less the value 
at time k of future premiums. This can be written in terms of the standard A and à symbols as 

1 
yk) 


or, equivalently (and the way in which it usually appears in the literature) as 


„V= [A.Cb) + a,(e) — a, )] (6.3) 


kV = Aix] (O o k) + Gp ee (€ o k) = üp a Gr o k). (6.4) 


Here we are using (4.8) and the corresponding statement for A. 
Under the assumption that premiums are actuarially equivalent to benefits, we can also 
calculate the reserve retrospectively as 


B£; y,) = ü, (T) — à, (e) — AGb)] . (6.5) 


a : 
Under this formulation the reserve is the value at time k of the past premiums less the value at 
time k of past benefits. The reader is cautioned that the reserve cannot be so calculated when 
premiums and benefits are not actuarially equivalent. 


Example 6.1 For the policy of Example 5.5, find the reserves at time 1 and time 2. See 
Figure 6.1. As in Chapter 2 we use an arrow to mark the point at which values are computed. 


Benefits 80 75 70 
| | | 
[ I [ 
Time 0 1 2 
Premiums 16 16 
v's 1/2 1/2 
q's 0.2 0.4 
T 


Figure 6.1 Example 6.1 


Solution. 


Value at time 1 of future death benefits = 75(1/2)(0.4) = 15. 
Value at time 1 of future annuity benefits = 70(1/2)(0.6) = 
Value at time 1 of future premiums = 16. 


jV 9515-21-16 = 20. 
The above approach is recommended for hand calculation. All reserve calculations can be 


handled in exactly the same way, although there will generally be more than one summand in 
each of the three items. 


80 INSURANCE AND ANNUITY RESERVES 


Additional information can be obtained by first computing the net cash flow vector, 
which also provides the best procedure for spreadsheet calculation. To illustrate, the vector 
Woo = (0.1, 0.2, —). (We don't have information to compute the third entry but it is irrelevant, 
since it is multiplied by 0.) Then, b * Wey = (8, 15,0), e = (0,0, 70), a = (16, 16, 0), so that 
the net cash flow vector f is (8, 1, —70). For present purposes, we can forget about the 
particular death benefits and premiums. From the insurer’s viewpoint the contract can be 
viewed as simply one of collecting 8 at time 0, 1 at time 1, and paying back 70 at time 2. From 
(6.2), ,V is just the negative of the value at time 1, with respect to interest and survivorship, 
of the payments at times | and 2. This equals —1 +70(0.5)(0.6) = 20, as above. 

Similarly, V is just the value at time 2 of the payment at time 2 which is 70. In general, 
for a contract running for n years, the reserve at time n is just the payment due at time n to the 
survivors. (Recall that from the convention introduced in Chapter 1, reserves are calculated 
before this payment.) For n-year term insurance, when there is nothing payable to survivors 
at the end, the nth year reserve will equal to 0. 

Since we have used net premiums, we can calculate balances as a check. 


8 8 
B (© = —— = — =20, 
l Yeol) 04 
1 1 
je, a eet. 4]=7 
(f) a 1002 ) 07218 + 0.4] = 70, 


which agree with our previous calculations. 
The following example exhibits a point of interest. 


Example 6.2 For a 4-year endowment insurance on (60), b, = 100, b} = 200 and there 
is a pure endowment of 200 paid at age 64 if the insured is then alive. You are given that 
m = 10,23 = 20, qe; = 0.1. The interest rate after 2 years is 25%. Find ,V and 3V. See 
Figure 6.2. 


Benefits 100 200 200 
| | | | | 
Time 0 1 2 3 4 
Premiums 10 20 
v's 0.8 0.8 
q's 0.1 2? 
ij 


Figure 6.2 Example 6.2 


Solution. 


Value at time 2 of death benefits = 100(0.8)(0.1) + 200(0.64)(0.9)q63- 
Value at time 2 of annuity benefits = 200(0.64)(0.9)p¢3. 
Value at time 2 of premiums = 10 + 20(0.8)(0.9) = 24.40. 
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We don’t know qq, but it is not needed. Since q63 + pg; = 1, the two unspecified terms sum 
to 200(0.64)(0.9) = 115.20 and 


2V = 80 + 115.20 — 24.40 = 170.80. 


Similarly, V = 200(0.8) — 20 = 140. 

This example shows that for an endowment insurance (with benefits paid at the end of the 
year of death) where the pure endowment is the same amount as the final death benefit, we do 
not need to know the mortality rate for the final year. This is in fact evident, since the insured 
gets that amount whether they live or not. 


Remark There is a subtle point involved with Equation (6.4), which is important to note 
for the actual calulation of reserves. We have already introduced the idea in Section 2.10.3. In 
our model we choose an investment discount function and a mortality table, and these remain 
fixed for the duration of a contract. In practice, when one reaches time k and actually wants to 
compute , V these assumptions may well have changed. The reserve computed at time k will be 


Alb ok)+ ONCE ok)— à ok) 


where the primes indicate quantities that are calculated with the new interest and survivorship 
function y. 4,4, 8$ computed under what could be changed conditions at time k. Of course when 
there is no change in assumptions, y. 44, Will just equal y, o k and formula (4.6) shows that 
both formulas are the same. The point is then, that in practice, one does not really know what 
V will be before time k. In this more realistic setting, we can view the original formula (6.4) 
as the best estimate one could make of , V, if asked to compute it at time 0. 


6.2 The general pattern of reserves 


Are insurance reserves generally positive or negative? Paradoxically, we will motivate the 
answer to this question by providing an example where they are neither. 


Example 6.3 An insurance policy provides a death benefit of 1 paid at the end of the year 
of death. Level annual premiums are payable for life. The interest rate is constant and the 
value of q, is a constant q for all x. Find the reserves and give an explanation for the answer. 


Solution. Let p = 1 — q be the constant value of p,. Then p, = p^, which is never 0. There 
is a positive probability of living to any age, so we have an example in which œ = oo. Our 
vectors will be of infinite length. 

Let v denote the constant value of v(k,k + 1). Then w, is a vector with a constant entry 
of vq and the premium pattern vector p is a vector with a constant entry of 1. The net level 
premium z is clearly vq, since that will make the vectors w, * b and z not only actuarially 
equivalent but actually equal to each other. The net cash flow vector f is then not just a 
zero-value vector but actually equal to the zero vector. All reserves will be 0. 

What is happening here is that the premium of vq, collected each year, will accumulate to 
q at the end of the year, and this will be exactly sufficient to pay the death benefits due at that 
time. There will be nothing left over, so balances, and therefore reserves, equal zero. 
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The situation above is typical for many forms of insurance, such as automobile or property 
coverage, and insurers of such risks have little in the way of reserves. The given scenario is, 
however, not realistic for life insurance. The values of q, are not constant but increase with 
age. If one paid for the insurance 1 year at a time, the yearly premium per unit of vg, would 
rapidly increase and eventually become prohibitively high. As we noted in Chapter 5, the 
typical life insurance policies will level the premiums out. In most cases, policyholders are 
paying more in premiums than they need to in the early years, but not enough in the later 
years. At any point of time after time zero, future premiums will not be sufficient to cover 
the remaining benefits. The excess collected in the early years is used to cover this deficit. It 
is expected therefore that reserves are usually positive. This has important implications for 
the life insurance industry. It means that investing becomes a major activity, as life insurance 
companies tend to accumulate large amounts of assets. Some critics, with little understanding 
of insurance, look at these holdings of real estate, stocks and bonds, and claim that they 
represent unfair profits made at the expense of the policyholder. The truth is, however, that a 
large portion of these assets represent reserves, which in effect belong to the policyholders, 
as they will be used to pay the future benefits. 

Negative reserves can arise on policies where the cost of the insurance benefits is decreas- 
ing each year. An example is a policy with a rapidly decreasing benefit amount, where despite 
the increase in q,, the quantity b, v(k, k + 1)q,4; decreases. Some examples appear in the exer- 
cises. Similarly, negative reserves can arise in the case where the premiums increase rapidly 
rather than remaining level. Insurers try to avoid such a situation if possible. A negative 
reserve means, viewing things prospectively, that the policyholder owes money to the insurer, 
which will be provided by future premiums, or equivalently, looking at things retrospectively, 
that the policyholder has received coverage but not yet paid for it. The problem is that the 
policyholder may stop paying premiums on the policy, leaving an unpaid debt. 


6.3 Recursion 


In this section, we develop some important recursion formulas. It is convenient to make a 
slight alteration in notation. We will incorporate the annuity payments with the premiums and 
let æ denote a — c. In other words, we think of annuity benefits as just negative premiums, 
which is really what they are, since the policyholder is receiving rather than paying these 
amounts. Our net cash flow vector f has entries f, = a, — byv(k,k + 1)q,4,, and from our 
basic recursion formula (2.26), 


pa V = (Vit my — bk k + Dayr) (K+ 19. (6.6) 


Since y,(k + 1,k) = (1+ i;,)/p,4, this is sometime written as 


l+i dx+k 
iV = GV +a - b, (6.7) 
xk Px+k 
The recursion is started with the initial value V = —a,(f) which will be 0 under our standard 


assumption of net premiums. 
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It is instructive to note that the last term of g,,;/p,+, is equal to d,.,/0,4,41- From this, 
we see easily that it is the amount, per unit of death benefit, that each survivor must pay at the 
end of the year to provide the benefits paid to those who died during the year. 

Formula (6.7) takes a retrospective viewpoint, and says that the reserve at time k + | is 
obtained from that at time k, by adding the premium, accumulating at interest and survivorship 
for 1 year, and then subtracting enough to pay the death benefits. It is known as the Fackler 
reserve accumulation formula, named after one of the early North American actuaries, David 
Parks Fackler. In pre-computer days it was a popular method for calculating reserves. It is 
used infrequently for calculating purposes now, but is still useful for illustrating how the life 
insurance reserve changes from one period to the next. 


Remark The quantity ¿V + z, in the above formula is often called the initial reserve at time 
k as it represents the reserve at the beginning of the year, after the payment of the premium. 
In contrast, ¿V is sometimes referred to as the terminal reserve at time k, reflecting the fact 
that it the reserve at the end of the year, prior to the payment of the premium for the following 
year. 


Alternate versions of this formula provide instructive information. We first give an impor- 
tant definition. 


Definition 6.1 The quantity b, — ;,,V is known as the net amount at risk for the (k + 1)th 
year and will be denoted by rj (the subscript is chosen to correspond to b,). (There are various 
other names in the literature, such as death strain at risk.) 


Now, multiplying (6.7) by Pyg = 1 — ¢,4, and rearranging, we get 


i GV H ih) oap: (6.8) 


Formula (6.8) reflects the fact that we can also view the accumulation of funds on an 
insurance policy as an interest-only investment, rather than as an interest and survivorship 
investment. From this point of view, the policyholder keeps the reserve when he/she dies (the 
amount accumulated in her box), but then the insurer only needs to make up the difference 
as a death benefit. The insurer is therefore at risk only for the difference between the death 
benefit and reserve, which is the source of the name. 

Readers who looked at Section 2.11 will note that this is a special case of the change of 
discount function that we investigated there. In this case y, and b, are replaced by v and ng, 
respectively. 


6.4 Detailed analysis of an insurance or annuity contract 
In this section we use (6.8) to provide a detailed discussion of the workings of an insurance 


policy. 


6.4.1 Gains and losses 


In practice, interest and mortality rates will not conform exactly to those provided by our 
model. In a particular year, the insurer may earn more interest than predicted by the given 
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discount function, which will result in gains. For an insurance contract, there may be fewer 
deaths than predicted by the given life table, also causing gains. If the insurer earns less 
interest than expected, and/or there are more deaths than expected, there will be losses. In any 
year, the actuary wants to analyze these gains or losses and see how much is due to investment 
earnings and how much is due to mortality. 

Suppose we wish to measure the gain for a particular policyholder over the period run- 
ning from time k to time k + 1. At the beginning of the year, before premium payment the 
policyholder’s box will contain the amount , V. At the end of the year the insurer must make 
certain that the box has ,,, V in order that future obligations can be met. Anything in excess 
of that amount can be considered as a gain, taken out and added to general surplus funds. On 
the other hand, if there is less than ,,, V, the insurer will have to make up the deficit from 
general surplus funds and there will be a loss. We will derive some general formulas. Suppose 
the actual interest rate earned during this year was i rather than i,, and the actual rate of 
mortality was q. „p father than q,,,. Then, the actual amount accumulated at time k + 1 will 
be the right hand side of (6.8) with starred i and q. If we subtract the reserve, we obtain the 
total gain G; from that policy for that year as 


G, = QV (1i) = 4 ule kV- (6.9) 
If we substitute for ,,,V with the left hand side of (6.8) we can write this as 
G; = Cm = dy) Nk + (i; = ix) GV + Ty). (6.10) 


which gives us a decomposition of the gain by source. The first term gives the gain due to 
mortality, and the second term gives the gain due to interest. That is, the mortality gain is the 
difference between the expected and actual mortality rates times the net amount at risk. The 
interest gain is the difference between actual and expected interest rates times the amount of 
funds at the beginning of the year, after payment of the premium. 


Example 6.4 Refer back to the policy of Example 6.1. Suppose that in the first year of 
the policy, the interest earned was 50% instead of the predicted 100%, and the actual rate of 
mortality was 0.1 instead of the predicted 0.2. Find the total gain for the year, split into the 
portion due to interest and the portion due to mortality. 
Solution. Substituting directly from (6.9), the mortality gain is 

(0.2 — 0.1)(80 — 20) = 6, 
while the interest gain is 

(0.5 — 1)(0 + 16) = -8. 
There is a total gain of —2, or in other words a loss of 2, for each policy. 


We will verify this by working out an example in the aggregate. Suppose that 10 people 
age 60 buy this policy at a certain time. The insurer collects 160, and this accumulates at 50% 
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interest to 240 at the end of the year. Out of this, the insurer will pay a death benefit of 80 
for the one death that occurred, leaving a total of 160. They have to put aside a total of 180, 
which is the reserve of 20 for each of the 9 survivors. There is a aggregate shortfall of 20, or 
2 from each policy, which has to be drawn from surplus. (Note that the gain given by (6.8) is 
for each policy in force at the beginning of the year, not at the end.) 

Formula (6.9) assumes that the premium paid is the net premium, calculated as in Chap- 
ter 5. In practice, the premiums actually charged on a policy will normally be different from 
the valuation premiums which are net premiums determined from the interest and mortality 
assumptions used to compute reserves. (See Section 6.5 for more detail on this.) This neces- 
sitates an adjustment in our analysis. Suppose the premium paid at time k is actually L^ rather 
than valuation premium z,. The first term on the right of (6.8) is then 


(Vta) (1 if) 2 GV +p (1 i2) + (ni n) (1 i7). 
which leads to an extra source of gain or loss. We now have 


Gr = (dean — Tony) Ne C — ip) (V + ag) + (tf m) (1+ 6). (6.11) 


The third term represents the gain or loss arising from premiums that differ from the valuation 
premium. (We will elaborate on this point in Section 12.4.) 

One application of the formulas in this section is to dividend calculation. Insurers fre- 
quently issue what are termed participating policies, in which gains resulting from favourable 
investment and mortality experience are returned to the policyholder in the form of dividends. 
We will not go into further detail on this topic, but note that formula (6.10) is a basic tool in 
computing these amounts. 

Another use of gain and loss analysis is to determine how changes in the basis assumptions 
will affect the reserves. As an example, suppose that the reserve interest rate decreases. It 
is clear that valuation premiums will increase. In order to provide the same benefits with 
decreased earnings from investments, more must be collected from the policyholder. On the 
other hand, it is not clear how this will affect reserves, since at any time, both the present value 
of the future benefits and the present value of the future premiums will increase. Indeed, the 
effect will depend on the nature of the policy. In the usual case however with a level premium 
z and reserves that increase with time, the lowering of the interest rate will increase reserves. 
To see this, suppose that interest rates are constant and there is a change of rate from i to 
i* < i. This will cause an interest loss of (,V + z)(i — i*) for the year running from time k to 
k + 1. Since the reserves are increasing with time, the losses will also be increasing with time. 
There will be of course a new level valuation premium of z + A to cover these losses. Due to 
the increasing losses, it must be that A is greater than the losses in early years and less in later 
years. In the early years, that portion of A not used to cover the loss, will be set aside as an extra 
reserve, to cover the greater losses to come in the future, and this will cause reserves to increase. 


6.4.2 The risk-savings decomposition 


We now look at a useful decomposition of the policy into a risk portion and a savings portion. 
Multiply Equation (6.8) by v(k, k + 1) and rearrange to obtain 


Tk = v(k, k + 1)dy+kk + [v(k, k + Deyi V —k V]. 
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This formula decomposes each premium into two parts. The first term is known as the risk 
portion of the premium, as it is that amount needed to buy insurance for 1 year for the net 
amount at risk. The remainder provides for the difference in reserves and is known as the 
savings portion of the premium. 


Example 6.5 Find the decomposition for the premiums of Example 6.1. 


Solution. In the first year the net amount at risk is 80 — 20 = 60. The risk portion of the 
premium is then ; x 0.2 x 60 = 6. The savings portion is ; x 20 — 0 = 10. (As a check, the 
two portions add up to the total premium.) In the second year the net amount at risk is 
75 — 70 = 5. The risk portion of the premium is ; X 0.4 x 5 = 1 and the savings portion is 


; X70 -20 = 15. 


This decomposition shows that any policy can be viewed as being composed of two 
separate policies, the pure insurance part and the savings part. 

For the pure insurance part, the premium is the risk premium and the death benefit paid 
is the net amount at risk. This part of the policy has zero reserves, as the risk premium is just 
sufficient to purchase coverage for the net amount at risk for 1 year. In the above example, 
the policyholder pays 6 in the first year, which is exactly enough to purchase the coverage for 
the net amount at risk of 60. He/she then pays 1 in the second year, which is exactly enough 
to purchase coverage for the net amount at risk of 5. 

The savings part of the policy operates just like a bank account, with amounts accumulating 
at interest only. In the example above, the savings portion of 10 from the first premium 
accumulates to 40 at time 2 and the saving portion of 15 from the second premium accumulates 
to 30 at time 2. The total savings of 70 are then paid out as the pure endowment to all survivors 
at time 2. This is typical of endowment policies, including whole life, which operates as an 
endowment at age œ, as we have indicated. The accumulated amounts from the savings portion 
of the premium increase steadily to the pure endowment amount. 

It is instructive to compare this with term insurance. 


Example 6.6 Redo Example 6.1 for the corresponding 2-year term policy without the 
endowment. That is, b still equals (80, 75, 0), but c = 0. 


Solution. In this case the premium z will equal (8 + 15(0.4))/1.4 = 10, and we have f = 
(2, —5,0). Then , V = 5 and we know from the discussion following Example 6.1 that, V = 0. 
In the first year, the net amount at risk is 75, so the risk portion ofthe premium is 10.275 = 
7.5, and the savings portion is 2.5. In the second year the net amount at risk is 75, so the risk 
portion of the premium is ; X 0.4 x 75 = 15 and the savings portion is —5. This shows the 
typical savings pattern on term policies. Modest savings are built up in early years, but these 
must be drawn on in later years when the premium is insufficient to pay for the insurance, 
resulting in a negative savings portion. As a check, the 2.5 deposited into the savings fund at 
time 0 will increase to 5 at time 1, which will then be withdrawn to make up for the deficit in 
the premium payable at that time. 


Exactly the same analysis as above holds for life annuity as well as life insurance contracts. 
In this case, the death benefits are all of zero amount, so that the net amount at risk will be 
negative. This is natural enough and reflects the facts that extra deaths in the case of annuities 
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result in gains. On single-premium annuities, we have m, = —c; < 0, for k > 0. Examples 
appear in the exercises. 


6.5 Bases for reserves 


Choosing the appropriate discount function and life table for the purpose of reserve calculation 
is another complex subject that we will only discuss briefly. As mentioned, the insurer must 
ensure that it has sufficient assets to cover their reserves, for if not, it will be in danger of 
being unable to meet future obligations. It is usually thought that reserve calculations should 
use conservative assumptions, so there is a built-in safety margin should experience prove to 
be adverse. In other words, the discount function would use somewhat lower interest rates 
than actually expected and the mortality table would show more deaths than actually expected 
(or fewer in the case of annuity contracts). In several jurisdictions, the bases used for reserves 
are specified by insurance regulatory bodies, whose main goal is to ensure protection for the 
policyholders. The following example illustrates the resulting effect on profitability. 


Example 6.7 For the policy in Example 6.1, the company is required by legislation to 
compute reserves using a 50% interest rate rather than 100%. It actually does achieve the 
estimated 100% return on its investments, and mortality follows the predicted rates exactly. It 
still charges the premium of 16 based on the realistic interest rate of 100%. Analyze the effect 
of the conservative interest rate assumption on the company’s gains and losses. 


Solution. Redoing the calculations with an interest rate of 50% in place of 100% leads to a 
valuation premium of 544/23 and a first year reserve of 560/23, as the reader can verify. We 
will do an aggregate analysis. Suppose the insurer sells 2300 policies at age 60. It will collect 
total premiums of 36 800, which will accumulate with interest to 73 600 at the end of the year. 
Out of this it will pay 460 people a death benefit of 80 units, for a total death benefit payment 
of 36 800. This leaves 36 800. It now must set up a total reserve of 44 800, consisting of 
560/23 for each of the 1840 survivors. Therefore the loss shown for the first year of the policy 
is 8000. Looking at formula 6.11, the large loss in the third term, more than offsets the gain 
from the second term. 

In the second year, it starts with a reserve of 44 800. It collects another 16-unit premium 
from each of the 1840 survivors for a total of 29 440. This leaves a total of 74 240, which 
accumulates with interest to 148 480. Out of this it must pay a death benefit of 75 to each of 
the 736 = 1840 x 0.4 people remaining 1104 people who survived. The total benefit payments 
are 132 480, which leaves a gain for the year of 16 000. 

For this group of policies, the insurer will show a loss of 8000 for the first year, and a 
gain of 16 000 for the second year. If the insurer had used the realistic interest rate of 100% 
interest, there would have been no gains or losses. In effect the more conservative reserve 
requires the insurer to borrow 8000 from surplus at the end of the first year, and then repay it 
with the 16 000 at the end of the second year, which is what the amount should be in view of 
the 100% interest rate. 


This example shows that for our present model, with a single discount function, reserve 
assumptions do not affect the ultimate profitability of the insurer, since this depends solely 
on what actually happens. It can, however, change the incidence of this profit from year to 
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year. This means that reserve assumptions can have an effect on profitabiltly when there are 
different interest rates involved, and we discuss this further in Section 12.4.3. 


6.6 Nonforfeiture values 


What should happen to policyholders who stop paying premiums before the term stated in the 
contract? This is known technically as withdrawal or lapse, or surrender. In consideration for 
the premiums they have already paid, they should be entitled to some reduced benefits under 
the policy. These are known as nonforfeiture benefits since they are benefits that were not 
forfeited by the cessation of premiums. In fact, in our simplified model, they should be entitled 
to take the reserve on their policy at any time they wish. Looking at this retrospectively, the 
reserve is the excess of the accumulated amount of their premiums over the accumulated cost 
of the insurance protection that they have received. It will be the amount that they have in their 
‘box’. In practice, insurers pay an amount that is somewhat less than the reserve for several 
reasons. This is a complex topic that we will only comment on briefly here. One reason is 
the high incidence of expenses in the early years of the policy (we discuss this more fully in 
Chapter 12). While these are accounted for by adding an amount to the premiums, the total 
amount of the initial expenses may not be recovered at the time of withdrawal. 

Another factor is the phenomenon known as anti-selection. This is a well-established 
concept in insurance which is simply a recognition of the fact that policyholders will make 
choices according to their own self-interest, acting on knowledge that they have, but that the 
insurer may not have. (The prefix 'anti' refers to the fact that it is the policyholder doing the 
selecting against the insurer.) On a life insurance contract, the option to withdraw is less likely 
to be exercised by an unhealthy policyholder than a healthy one. After all, if someone is told 
they will die within a few months from a terminal disease, they would be foolish to give up the 
policy. Consequently, the group that does not withdraw can be expected to experience higher 
mortality than normal. There is therefore an anti-selection expense to withdrawal, in the form 
of these higher mortality rates of the remaining policyholders. The principle followed here is 
that this expense should be borne equitably by all the policyholders, not just by those who 
remain. This is done by adjusting the amounts paid out in the case of withdrawal. 

The cash amount that will be paid to the withdrawing policyholder on a life insurance 
contract is known as the cash surrender value and is usually guaranteed at the time of issue 
for all durations. Normally, the policyholder is given the option of taking the nonforfeiture 
benefits in the form of a reduced level of insurance rather than in cash. The reduction can take 
the form of either a reduced amount of benefits, or a reduced term for the same benefits. 

Life annuities also present an obvious possibility for anti-selection. Unhealthy annuitants 
would find it worthwhile to end the contract, take their reserve, leaving a group who could 
be expected to live longer than expected, and mortality losses would result. For this reason, 
nonforfeiture benefits would not be offered on single-premium life annuity contracts, once the 
benefits commence. Depending on the contract, they might be present during a deferred period. 


6.7 Policies involving a return of the reserve 


At the time of death, the policyholder (or, more accurately, the estate of the policyholder) 
receives the death benefit. Had he/she decided to lapse the policy an instant before death, 
he/she would have received an amount close to the reserve. Some people, who do not have 
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a complete understanding of life insurance, have raised the complaint that the company is 
confiscating the person’s reserve, since it is only returning the death benefit, and not the 
reserve, at death. The answer to this is that one must adopt one of two points of view. One 
can view the insurance policy as an interest and survivorship investment. In that case, it is 
true that the reserve is taken and spread among the survivors, but this is completely fair, as 
discussed in Chapter 4. Alternatively, one can view the policy as an interest-only investment. 
In this case the reserve is indeed available at death. But now, one must view the death benefit 
as the net amount as risk, rather than the originally stated amount. 

It is possible to design a policy where a fixed amount, plus the reserve, is paid at death. The 
policyholder might then be told that the reserve is being paid in addition to the death benefit. 
This is of course just playing with words. What you really have is a policy with the pattern 
of death benefits worked out so that when you subtract the reserve you get some prescribed 
amount. Consider the following example. 


Example 6.8 A policy on (x) provides for a payment, at the end of the year of death, of 
] plus the reserve, should death occur within n years. Level annual premiums of P are payable 
for n years. Find a formula for P. 


Solution. A direct solution of this problem by the method outlined in Chapter 5 will cause 
difficulties, since P depends on the reserves, but the reserves in turn depend on P. While it 
may be possible to solve the resulting equations for small values of n, it is much better to take 
the interest-only view for the accumulation of money. That is, we use the discount function 
v in place of y, and the net amount at risk 7, in place of b,. Expressing the insurance as an 
annuity, we wish to solve 


Pàá(l,iv) = ay * w,;v) 


where y is the vector (7o, 7], .... 71, 4). While this could be done on any policy, it would 
normally be completely impractical, since we would not know the net amounts at risk in 
advance. In this case, however, it works perfectly. We are given that b, = 1 - 4,4 V, so that 
ng = 1 for all k. This leads to 


n-i 


Pü(1,;v) = M vk + Dare 
k=0 


and we easily solve for P. 


For another application of this idea, we can give a more formal solution to Example 5.8. 
We calculate the balance at time 25 and equate it to the reserve at time 25, which we know 
is d (40; 425. To calculate the balance, we use the discount function v and death benefits of nz. 
In this case, the actual death benefit is just equal to the reserve, so that rj, = O for all k. The 
balance is just the accumulated value of the premiums at interest, which is Pv(25, 0)à(155; v), 
as we had before. 

Note that reserve calculation by recursion is quite simple in this type of policy. One simply 
uses (6.8) where now 7, is just the face amount of the policy. 


90 INSURANCE AND ANNUITY RESERVES 


6.8 Premium difference and paid-up formulas 


Consider any policy on (x) with death benefit vector b and with level annual premiums of 
P payable for h years, so that the premium vector æ = P(1,). In this case, there are some 
other formulas that are useful for providing additional insight into the nature of balances and 
reserves. Throughout this section P is arbitrary and not necessarily the net premium. 


6.8.1 Premium difference formulas 


Fix a duration k « h. Let P, be the level premium that should be charged for a policy with the 
same remaining benefits, if issued at time k to a person age x + k. That is, the value at time 
k of these new premiums should equal the value at time k of the death benefits after time k. 
Equating values at time k, 


Apa 0 K) = Praha ppp) (6.12) 


Since ¿V = Ap (b ° k) — Páy (1, 4), we substitute in (6.10) to get 


KV = (Ps — Pág (3 i). (6.13) 


Formula (6.11) is known as the premium difference formula for reserves. The quantity P, — P 
is the difference between what should be charged and what is actually charged after time k. 
The value at time k of this yearly deficit over the remaining premium payment period gives 
the reserve. 

We can also obtain a retrospective premium difference formula. Let P, be the premium 
that could have been charged to provide the benefits that were provided up to time k. That is, 


A,(jb) = På). (6.14) 
Since B, = [Pa,(1,) — A,(,b)]/y,(&, we substitute in (6.14) to get 


_ (P= PO, (4) 


B,- 6.15 
k y. TUS 


The quantity P — P, is the difference between what was actually charged and what could 
have been charged up to time k. The accumulated amount of this yearly excess gives the 
balance. When P is a net premium B, = ¿V and (6.15) gives another formula for the reserve. 


6.8.2 Paid-up formulas 


Substituting for the annuity rather the insurance in (6.12) gives 


P 
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Similarly, from (6.15), 


P Ab) 
B,-2|[—-1 ; 6.17 
k (F eu PUTA 


To interpret formula (6.16), note that the fraction of the future benefit that is purchased 
by the actual premiums is P/P,. For example, if P, = 36 and P = 24 then the policyholder 
is only paying two-thirds of what they should be for the future benefits. The difference of 
(1 — P/P,) must be that portion of the future benefits that has already been provided by the 
excess past premiums, so that multiplying this ratio by the present value of future benefits 
gives the reserve. In formula (6.17) the ratio P/P, — 1 is that portion of the past benefits that 
were purchased but not needed. For example, if P — 24 and P, — 16, then the policyholder 
has paid 1.5 times what they could have paid to provide those benefits. Multiplying this ratio 
by the value of these past benefits gives the balance. 

Formula (6.16) is known as the paid-up formula. Suppose a policyholder lapses at time 
k and is given nonforfeiture benefits equal in value to the reserve. If the individual elects to 
take paid-up insurance for a reduced amount, this formula shows that the appropriate fraction 
is (1 — P/P). 


6.8.3 Level endowment reserves 


We can use the premium difference formula to derive a very simple expression for reserves on 
level endowment insurance where a net level premium is paid for the full period and interest 
is constant. At duration k of an n-year contract, we know from (5.10) that 


1 1 
P meon = d, P e a fa es d, 
a,.(1,) i ag) 


and, substituting in (6.13), 


cual, g) 


Vy21-——— 
$ a (1,) 


, (6.18) 


reducing the calculation of reserves on such a policy to the calculation of annuity values. 


6.9 Standard notation and terminology 


The standard notation for reserves closely follows the notation for annual premiums as 
described in Section 5.8. The basic symbol for the reserve at time k is , V as we have adopted. 
This is embellished in exactly the same way as the symbol P was for annual premiums, with 
one exception. The premium payment period is moved to the upper left, from the lower left, 
since the latter is now used for the duration. If the upper left is empty, it signifies again that 
level premiums are paid for the natural duration of the contract. The following are some 
examples. 
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Vx is the reserve at time k for a 1-unit whole life policy on (x), with level annual premiums 
paid for life. The basic prospective formula (6.4) for this policy in standard symbols reads 


kVx = Ac EE Py, ua. 


The corresponding retrospective formula is 


Pu. 
kVy = PS eI Acn TT E 


To simplify retrospective reserve formulas, a symbol for the last term was introduced. Let 


|0 Df 
x tl Px 


k, =A 


This is often called the accumulated cost of insurance and denotes the single premium that 
each survivor would pay per unit of death benefit for t years, if this single premium were 
collected at the end of the k-year period, rather than at the beginning. (This would never be 
done in practice as it is not feasible to charge people at a time when they have no chance of 
collecting.) 

,Vl., is the reserve at time ¢ for a l-unit, n-year term policy on (x) with level annual 


x: Ti 
premiums payable for n years. For t < n, the prospective formula for this quantity is 


I..3441 1 2 
iV rn Au nil Pom Oen mo 


while the retrospective formula is 


1 T as 
Vem= P. xti ky. 


x:mi Ke 


ae mis the reserve at time t for a l-unit, n-year endowment insurance policy on (x), with 


level annual premiums payable for h years. For t < n, this is given prospectively as 


h om . 
iit! Axe n-n P; m xr n= if t <h, 
m if t>h, 


x+t: nT 


or retrospectively as 


h c 2 
h P. mx: Flt ky ift < h, 
Vem = hp _ d+) k ft> h 
x: 75x: Til 7 —— Tr Kx itn. 

t-hPx 


x VGà,) denotes the reserve at time k for a deferred annuity on x providing 1 unit for 
life beginning at age x+n and with annual premiums payable for n years. For k < n, the 
retrospective formula is the easiest and is given by 


VG là) = P(,|à,)8,. Th 
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For k > n, the prospective formula is the easiest and is given by 


kVa là) = Gy sk: 


6.10 Spreadsheet applications 


For reserve calculations we add two columns J and K to Chapter 5 spreadsheet. In cell J10, 
we enter 


= I10 — F10 — L10 
and copy down to get the net cash flow vector. Then in cell K10 we enter the formula 
= —SUMPRODUCT(E10 : £$129, J10 : J$129)/ E10, 


and copy down. 

This calculates ¿V in cell K 10+k, as the negative of the value of future net cash flows, 
divided by y,(k). (Division by 0 error terms will appear for large enough durations, but these 
occur after age w and can be ignored If one prefers, the formula in K10 can be suitably 
modified with an IF statement to replace them with a blank.) As a test calculate ;5V for the 
test problem given in Section 5.9. The answer is 333.16. 

We now have a final spreadsheet that will calculate premiums and reserves on all insurance 
and annuity contracts without guaranteed payments. Here is a complete summary. 


INPUT FORMULAS 


Column D: 1 in cell D10, Section 2.14 formula in cell D11. Copy down. 
Column E: 1 in cell E10, Section 4.8 formula in cell E11. Copy down. 
Column F: Section 4.8 formula in cell F8. 

Column G: Section 5.9 formula in cell G8. 

Column H: Copy cell F8 to cell H8. 


Column I: Insert the formula =(F8 + G8)/H8 in I6. Copy Cell H8 to I8. Insert the formula 
—$1$6*H10 in I10 and copy down. 


Column J: Copy down the formula for cell J10. 
Column K: Copy down the formula for cell K10. 


INPUT DATA FOR EACH PARTICULAR PROBLEM 


Column B: Interest rates. 

Cell C1: Age at issue. 

Column F: Annuity benefit vector c. 
Column G: Death benefit vector b. 
Column H: Premium pattern vector p. 


Column N: Life table. (For sample table, insert parameters in N3 and N4, Section 4.8 
formula in N10. Copy down). 
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OUTPUT 


à,(c) in F8. 
A,(b) in G8. 
Present value of all premiums in I8. 


The premium vector z in column I. 


Reserves in column K. 


Exercises 


Type A exercises 


6.1 


6.2 


6.3 


6.4 


A 3-year endowment insurance on (60) provides for death benefits payable at the end of 
the year of death. The death benefit is 1000 for death in the first year, 2000 for death in the 
second year and 3000 for death in the third year. In addition, there is a pure endowment 
of 4000 paid at time 3 if the insured is then alive. This is purchased by three annual 
premiums, beginning at age 60. The first two premiums are equal and the third is double 
the amount of the initial premium. You are given qe; = 0.10, qs; = 0.20, qe, = 0.25. 
The interest rates are 25% for the first 2 years and 100% in the third year. Find the 
initial premium. Find ,V for k = 1,2,3. 


You are given ggg = 0.20, dg, = 0.25. Interest rates are 20% in the first year, 25% in the 
second year and 50% in the third year. A 3-year endowment insurance policy issued to 
(60) provides for death benefits, at the end of the year of death, of 1000 if death occurs 
in the first year and 2000 if death occurs in the third or second years. In addition, there 
is a pure endowment of 2000 paid at age 63 if the insured is then alive. Level annual 
premiums are payable for 3 years. Find the premium. Find ,V, for k = 1,2,3. Do this 
first by the basic prospective reserve formula, and check your answers by using the 
recursion formula (6.7). 


For a 3-year term insurance policy on (x), level premiums are payable for 3 years. The 
following data are given. 


Atk ik b, 


0.10 0.20 50000 
0.15 0.25 20000 
0.20 0.30 15000 


NFO > 


(a) Find the vector y,. 

(b) Find the net cash flow vector. 

(c) Using your answers to (a) and (b), compute ,V and jV. 
(d) Explain briefly why your answers to (c) are negative. 


A 10-year endowment policy on (40) has death benefits of 900, payable at the end of 
the year of death, should this occur within 10 years, plus a pure endowment of 900 
at age 50 if the insured is then alive. Level annual premiums of 20 are payable for 


6.5 


6.6 


6.7 


6.8 
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10 years. The interest rate is a constant 50% and q4g = 0.25. You are not given q49. 
Find ,V for t = 8,9, 10. 


An insurance policy has level premiums of 20 payable for the duration of the contract. 
If the reserve is calculated with a premium of 15, then ọV = 100. What is 9 V if the 
reserve is calculated with a premium of 16? 


Refer to Exercise 6.2. 
(a) Decompose each premium into the risk portion and savings portion. 


(b) Suppose that during the second year of the policy the actual interest rate earned 
was 20% rather than 25% and the actual rate of mortality was 0.20 rather than 0.25. 
Find the per policy gain during this year from both interest and mortality. 


Refer to Exercise 6.1. Decompose each of the three premiums into the risk portion and 
savings portion. 


For a certain contract issued at age 60, ¿V = 200, the premium payable at age 65 is 40, 
os = 0.20, the death benefit payable at age 66 for death between age 65 and 66 is 800, 
the interest rate for the sixth year of the contract (between age 65 and 66) is 20%. 


(a) Find ¢V. 


(b) Decompose the premium payable at age 65 into the risk portion and the savings 
portion. 


(c) Suppose that during the sixth year, the actual interest rate earned was 25% instead 
of 20% and the actual rate of mortality at age 65 was 0.15 instead of 0.20. Find the 
gain or loss during this year, from interest and from mortality, for each policy in 
existence at the beginning of the year. 


Type B exercises 


6.9 


6.10 


Refer again to Exercise 6.7. Explain briefly in words why the third premium has a 
negative risk portion. 


A deferred life annuity on (60) provides for annuity benefits payable for 3 years, 
beginning at age 63, provided that (60) is alive. The first annuity payment is 1000, 
the second is 2000 and the third is 3000. Premiums are payable for 3 years, beginning 
at age 60. The second and third premiums are equal in amount and each double the 
amount of the initial premium. If (60) dies before age 63, there will be, at the end of 
the year of death, a return of all premiums paid prior to death, without interest. You 
are given that ggg = 0.1, ge} = 0.2, dg. = 0.25, qg4 = 0.3, G64 = 0.4. The interest rates 
are 25% per year for the first 4 years, and 50% for the fifth year. Find (a) the initial 
premium, (b) ; V and (c) 4V. 


A life insurance policy on (x) has level death benefits of 1000. A life insurance on 
(y) has exactly the same premiums and reserves as the policy on (x) for the first 10 
years, but different death benefits. Suppose that qy} = 29,4, for k = 0, 1, ...,9. If the 
common reserve gV = 300, what is the death benefit on (y)'s policy for the year running 
from time 7 to time 8? 
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Use formula (6.8) to derive formula (4.9) with k = 1. (Note that for annuity contracts 
the death benefits are 0, and the premiums are the negative of the annuity benefits.) 


You are given that Gg, = 3.2, qgo = 0.10, i = 0.20. 
(a) Find ägo- 


(b) A single-premium whole life annuity on (80) provides for a constant benefit of 1 
per year. Due to improvements in mortality, the actual mortality rate experienced 
during the first year of the contract is 0.07 rather than 0.1. On the other hand, the 
interest earned during that year was 22% rather than 20%. What is the total per 
contract gain for this year? 


A 2-year term insurance policy on (50), with a constant death benefit of 1000, is 
purchased by two-level annual premiums. Policyholders who choose to lapse the policy 
at time 1 will receive ; V as a cash value. Assume that mortality and interest follow the 
projected pattern, and that the premium charged is the net premium. Suppose, however, 
that a typical group of people age 51 who have purchased insurance at age 50 can be 
divided into two groups. Half of them are healthy and half are not. The non-healthy 
group can expect to have twice as many deaths over the following year as the healthy 
group. Suppose that gs; = 0.03. What is the loss on each remaining policy for the 
second year, assuming: 


(a) All of the healthy policyholders at age 51 lapse the policy at that time, and none of 
the unhealthy ones do; 


(b) One-half of the healthy policyholders, and one-quarter of the unhealthy policy- 
holders, lapse the policy at age 51. 


For a certain whole life policy on (x), given by a death benefit vector b and premium 
pattern vector p, reserves are calculated according to two mortality tables, with the 
L-year probabilities of dying denoted by Qc and g 4 Tespectively. The same positive 
interest rates are used in both calculations, and in each case the premiums are the net 
premiums as determined by the particular table being used. Suppose the two ‘curves’ 


cross at one point. That is 


dek Z Gap kK=0,1,...,7, 
dag S Gap k=nmn+l,...,0@0-x- 1. 


Assume that death benefits are nonincreasing. Show that the reserve at time n is higher 
for the starred rates (the steeper curve). 


(a) Use (6.8) to derive the formula 
Ax (0,4) — A«(1,) = iA,(1,) EX q,(1 - Ax (15))). 


(b) As an actuary for an insurance company, you receive an angry letter from a pol- 
icyholder. The person, age 50, has just purchased a single-premium 3-year term 
insurance policy with a constant death benefit of 1. The complaint is that the per- 
son's friend, age 49, purchased a single-premium 4-year term insurance policy, 
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with the same death benefits, for a lower single premium. The letter writer argues 
that both policies provide coverage for the ages 50—53, but that the friend gets an 
extra year of coverage, for the year running from age 49 to 50. Therefore, it is 
claimed, the friend's premium should be higher, not lower. What is your response? 


A special 2-year term insurance policy on (70) is to be purchased by a single premium. 
Should death occur in the first year, the insured will receive, at the end of the year, 1000 
plus the reserve at that time. If death occurs in the second year, the insured will receive 
at time 2 only a return of the single premium paid without interest. Suppose q;9 = 
0.36, g7, = 0.40, and the interest rate is 100%. Find (a) the single premium, (b) , V. 


A whole life insurance policy on (x) provides for death benefits, paid at the end of the 
year of death, of 1000 plus the reserve at that time. Level annual premiums of P are 
payable for life. Suppose that q, = 0.2, q,,, = 0.2 and q, = 0.4 for all y 2 x + 2. The 
interest rate is a constant 100%. Find (a) P (b) ,V, k = 1,2, .... 


Two people, A and B, are both age x. A buys a 20-year endowment policy, with a 
constant death benefit of 1, and a pure endowment of 1 at time 20, paying level annual 
premiums for 20 years. B buys a whole life policy with a constant death benefit 1, 
paying level annual premiums for life, and in addition, each year, invests the difference 
between his/her premium and A's premium, in a savings account. At time 20, the 
reserve on B's policy, plus the amount in his savings account total 1. All premiums 
were calculated at a constant interest rate of i. B earned a constant rate of j on his 
investments. Is j greater than, less than or equal to i? Why? 


A whole life policy with a level death benefit of 10 000 carries net annual premium of 
100 payable for life. The rate of discount is a constant 0.04. At time n, the policyholder 
wishes to reduce their annual premium payment to 25, and is told they can do so, but 
the death benefit will be reduced to 7000. Assuming that the entire reserve is available 
as a nonforfeiture benefit, find „V. 


Suppose that a mortality table used to calculate reserves is altered by adding a positive 
constant to each value of q,. Explain how the reserves will be affected, for whole life 
policies with a constant death benefit and level premiums. 


Spreadsheet exercise 


6.22 


Suppose that mortality follows the sample life table of Section 3.7 and interest rates are 
596 in the first 15 years, 696 for the next 15 years, and 796 thereafter. A contract issued 
at 40 provides for a life annuity beginning at age 65. The annual annuity payment is 
1000 for the first 10 years and 2000 thereafter for life. If death occurs before annuity 
payments begin, there is a death benefit of 10 000 paid a the end of the year of death. 
Level annual premiums are payable for 15 years. 


(a) Find the premium and all reserves. 


(b) Suppose the interest rate in the first 15 years increases from 5% to 5.5%. Do 
reserves increase or decrease? 


(c) If the interest rate in the first 15 years decreases to 4%, what happens to reserves? 


Fractional durations 


7.1 Introduction 


Up to now, we have considered cash flows where the payments were at integer times. We 
now deal with the case where cash flows can occur at fractional durations. For example, if the 
basic time unit was a year, and payments were made monthly, these would be at times that 
are multiples of 1/12. In practice this is a common occurrence. Purchasers of life annuities 
often want the income to be paid monthly. Many purchasers of insurance policies wish to pay 
premiums monthly, or possibly quarterly or semiannually. One obvious method of handling 
this feature is to change the time unit. If we are dealing with annuities with monthly payments, 
we could just take our unit of time as 1 month, and payments would be at integer times. This 
option was not always feasible in pre-computer days. For cash flows discounted at interest 
only, calculations were done from tables showing yearly rates of interest, and elaborate 
formulas were developed to handle the fractional durations. This is no longer necessary, and 
the change-of-period approach is the modern way to handle the valuation of cash flows at 
compound interest. 

For life annuities however it is still common to keep the time unit as a year. There are 
several reasons for this. In the first place, the year has been so ingrained as a measure of 
age that it seems hard to break away from this tradition. Another more important reason 
is that insurers often do not calculate premiums exactly for the different periods, but rely 
on approximate conversion formulas. For this purpose it is convenient to have some simple 
relationships established between the yearly annuity values and those with more frequent 
payments. Still another reason is that we also want to consider annuities with payments made 
continuously, which will be discussed in Chapter 8. To do this, it is convenient to first discuss 
the case in which payments are made m times per year, and then view continuous annuities 
as a limiting case as m approaches infinity. 

We will follow the standard usage of referring to annuities with payments made at times 
which are multiples of 1/m as mthly annuities or annuities payable mthly. 
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We will restrict attention to sequences of cash flows where the periodic payments during 
each year are constant, but can vary from year to year. If payments change during the year 
(e.g., they increase each month), then usually the only feasible approach is to change the 
period. The basic model is as follows. We are given a positive integer m, which is the number 
of payments made each year. We are given a cash flow vector, € = (cg, c4, ... , cy) where, as 
before, c, is the total amount paid in year k + 1, that is, between time k and k + 1. Rather than 
being paid as a single cash flow however, there are payments of 


Ck ; j : 
—attimek+—, jz0,L...,m- 1. (7.1) 
m m 


Let y denote an arbitrary discount function. We denote the present value of the sequence 
of cash flows as described by à" (c; y). The task at hand is to relate this quantity to the case 
where m — 1, namely à(c; y). Our first observation is that it is sufficient to find a formula for 
ac; y) in the case that c = e, a single payment of 1 at time zero. This follows from the 
replacement principle. Let u, be the value at time k of a sequence of payments of 1/m at time 
k+ (/m),j = 0,1,...,m— 1. We can write 


uy = a" (e? yok). 


The replacement principle tells us that we can replace all the payments between time k and 
k+ 1 by a single payment of cu, at time k. It follows that 


ü (ery) = a(e * u; y). (7.2) 


where u = (Ug, Uy, ... , uj, ...). 
The key to evaluating fractional annuities is to calculate the quantities u,. We will discuss 
particular cases in the following two sections. 


7.3 Cash flows discounted with interest only 


Suppose that we are given yearly interest rates, and we want to consider a change of period to 
1/m of a year. We use primed symbols to denote the quantities applicable to the new period, 
that is, v/, i" and d’ in place of v, i and d, respectively. We will start as usual by supposing 
that we are given the interest rates i, for each nonnegative integer k. To determine the rates 
applicable to the new periods, the standard assumption made is that interest rates are constant 
over each year. To be precise, one postulates is that for all nonnegative integers k, (2.9) holds 
whenever s, t, 5 + h, t + h all lie in the interval [k,k + 1). 

Let i denote the constant value of v(s + 1/m, s) — 1 and let d, denote the constant value of 
] — v(s,s + 1/m) fork < s < s+ 1/m < k+ 1. An investment of 1 at time 0 will accumulate 
to 1 +i at time 1/m and then to (1 + ir at time 2/m, (1 + i» at time 3/m and so on. 
Arguing in the same way for the discount rate gives 


(1* 4)" 1i, (7.3) 
(1- 4)" 21-4, (7.4) 
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since in (7.3) each side of the equation is the value at time k + 1 of 1 unit invested at time k, 
and in (7.4) each side of the equation is the value at time k of 1 unit paid at time k + 1. 

In standard actuarial notation it is common to 'annualize' these rates by multiplying by 
m. 'That is, one defines 


i, d -md. (7.5) 


i =m 
The quantities i and d™” are often known as nominal rates, to reflect the fact that they are 
rates in ‘name only’, and not really applicable to any period. They are sometimes referred 
to as the yearly interest (respectively discount) rates convertible or compounded mthly. The 
actual rate applicable to each mthly period in the year k to k + 1 is found by dividing these 
nominal rates by m. 
In the case of constant interest the subscript k is not needed and we simply use the symbols 
i"? and d for the nominal rates. 


Example 7.1 Suppose that i"? = 0.12. Find i, the yearly rate of interest, if (a) m = 2, (b) 
m= 12. 


Solution. 


(a) We must divide by 2, to deduce that the rate per half-year period is 0.06. From (7.3), 
(1 + i) = 1.06? = 1.1236 so that the corresponding yearly interest rate is 12.36%. 


(b) Similarly, the corresponding yearly interest rate = (1 + 0.01)? — 1 = 12.68%. 


It is important to distinguish between the actual yearly rates and nominal rates for different 
periods, when comparing quoted figures. For example, by custom, the rates on Canadian 
mortgages are quoted as nominal rates with m = 2, while for US mortgages, m = 12. These 
provisions would be evident in the actual written contract, but are not always made clear 
to a prospective borrower. As the above example indicates, the actual annual rates paid on 
Canadian or US mortgages would be higher than the quoted nominal rates. 

We now deduce the present value of fractional annuities for the investment discount 
function. To simplify the notation, we first assume constant interest. A 1-year, l-unit annuity 
payable mthly is simply an annuity paying 1/m units for m periods, discounted with the 
primed rates. From formulas (2.16) and (7.4) we can write 


: 1 f1- 0-4) d 
= 7) (a0) = = 
uy, = &™ (e) = ( 7 p (7.6) 


for all k, where the last quantity is taken as 1 if d = 0. From (7.2), 
(MM) (a. = d. p 
a’ (e; y) dom a(c; v). 


For the general case of non-constant interest we have u, = d; / qe, which is taken equal 
to Lif d® — Q. 
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Note that in the case of positive cash flows (and assuming the usual case of positive interest 
rates) the present value with payments paid mthly is less than the corresponding yearly present 
value. This can be shown mathematically by noting that d, is less than uu For example, if 


d, = 0.36 then, from (7.4), d! = 1 — 0.64!/? = 0.20 and d? = 0.40. (The reader is invited to 
provide a general proof.) This conclusion is also evident from a comparison of payments. In 
the yearly case, a full payment of cg would be made at time 0. In the mthly case, the annuitant 
receives co/ m each period of length 1/m, and will not have collected the full cy amount until 
one mth of a year before the year end. This is true for each year of the annuity. There is 
therefore a loss of interest, which is reflected in the lower present value for the mthly case. 


7.3 Life annuities paid mthly 


We now consider the case where we have the interest and survivorship discount function y,. 
We will usually write at" (c) for à" (c; y,) as we did when m = 1. 

As we are normally given mortality data only for integer ages, we are faced with the 
necessity of making some assumption in order to extrapolate for the data that we do not 
have. Many approaches are possible. In the next section we describe the most commonly used 
method. Other possibilities that have been proposed are described in Exercises 7.6, 7.13 and 
7.14, as well as later in Chapter 8. 


7.3.1 Uniform distribution of deaths 


This preferred method is known as the assumption of a uniform distribution of deaths over 
each year of age, abbreviated as UDD. It can be viewed as just linear interpolation for £. 
Suppose we assume that 7, rather than being defined just for integer values, is defined for all 
nonnegative y. The UDD assumption is that for any integer x and 0 < t < 1, 


C 0 — OO, Eg. 
The name comes from the fact that given 0 < t « t - h « 1, 
ltt 7 Cer = RU, — Eua) = hd,- 
In other words, during any interval of h years which lies between two integral ages, the number 
of deaths occurring in that period is just A times the total number of deaths for the yearly 
interval. We can therefore say that the deaths are spread uniformly over the year. 
Example 7.2 Suppose Z9 = 1000, 7, = 940. Assuming UDD, what is the probability that 


a person age 604 will die before reaching age 603? 


Solution. The number of deaths in the 1 year period between 60+ and 60i equals 10, 
one-sixth of the total deaths in the year. Out of 1000 lives at age 60, there will 20 who die 
before reaching age 605 (one-third of the yearly deaths), leaving 980 alive at age 603. The 
probability is therefore 10/980. 
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An equivalent form of the UDD assumption is 
19x = tqx, for any integer x and 0 < t < 1, (7.7) 


which follows immediately by writing ,q, as (y — £,,,)/£,. 
Assuming UDD, we can easily calculate p, for all t and y (not necessarily an integer age) 
by the formula 


diss t+hPx 
th x+h T > 
hPx 


which follows from the multiplication rule. This points out an inconsistency in the UDD 
assumption. Suppose that the life table follows the expected pattern of increasing mortality 
by age. That is, for all integers x < y we have q, < q,. Then it can be shown that under UDD, 
tqx €, qy for integers x € y and all t > 0, as we would expect. However, this inequality need 
not hold if x and y are not integers. See Exercise 7.15. 


7.3.20 Present value formulas 


We now derive formulas for fractional duration life annuities by making the UDD assumption 
for mortality and the normal assumption for interest that we used in Section 7.2. This is the 
standard procedure, but the different treatment of the two factors leads to some complications 
in the formulas. We first assume constant interest. For our purpose we require some new 
interest quantities. Let 


fim) = yim 4272/4 e (m — Dyen-D/m]. 
yn 


a(m) — xm + df(m). 


We will later derive a simplified expression for f(m). 
We have 


at (e?) = -[ + vil P + vl" Px Tec PD inp. 
From (7.7), jj p, = 1 — (j/m)q,, so we can write 
uy = à? (e?) = Lg Hvt a Mm uus yD] 
2 n 4 2y2/M foe (m — yen D/m] 
= Ae) — B(myq, = -E - Pony, 


where we refer back to (7.6). Similarly, for any positive integer k, 


d 
Hy = do» — B(m)vq, +4: 
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From (7.2) and (5.6) can now write 


d 
don) 


a (©) = a, (©) — p(mA, (©). (7.8) 


Somewhat surprisingly, an insurance term has come into an annuity value formula, but it is 
easily explained. Apart from interest loss, a reason for the mthly annuity to be worth less than 
the yearly annuity is the loss of annuity income in the year of death. The annuitant receiving 
benefits mthly will only receive payments up to the time of death in their final year. For 
example, if they die shortly after the beginning of the year they will only get 1/m of the total 
yearly income. Of course, they could receive the total income by living until the last mth of 
the year. On average, annuitants should lose slightly less than half of the income in the year 
of death, which is reflected by the second term in (7.8). Under UDD the actual proportion is 
B(m). This is close to (m — 1)/2m for small interest rates, as shown in Section 7.5 below. 

To derive an alternate expression to (7.8) that uses only annuity present values we apply 
the general insurance-annuity identity (5.8) to the second term in (7.8) and obtain 


a” (c) = a(m)à,(c) — f(mà,(Ac). (7.9) 


In the general case where the interest rate is constant over each year, but can vary from year 
to year, a (m) and f(m) will be vectors with the entry corresponding to index k, equal to the 
corresponding value for the interest rate i,. The general form of (7.9) will be 


ay (e) = (e * a(m)) = à, (Ae * B(m))). 
For the particular case of the level payment annuity with constant interest we have 
aL) = any Q,) = any = vo,p,) 


Example 7.3 A woman age 60, earns a salary of 60 000 per year, payable monthly, and 
would normally expect to receive a raise of 2000 per year until retirement at age 65. She is 
injured and unable to work again. She plans to sue the parties responsible for her injury, for 
an amount equal to the present value of her lost salary. Find a formula to calculate this present 
value. Assume a constant interest rate. 


Solution. We will assume the accident occurred just before her 60th birthday, and that the 
first raise is due 1 year later. The present value is 


1000 fag te)] = 100011204, (€) — &124 (A0). 


where c = (60, 62, 64, 66, 68), so that Ac = (60, 2, 2, 2, 2, —68). 
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c/4 c,/4 cy /4 c,l4 Ch414 Cry /4 Cra /4 [NU Ck42/4 ... Due payments 
L | | | | | | | | 
f T T lj T T T T 1 
k k+1/⁄4 — ke2/4 4.3/4 kl k-5/A4  k+6/4  k+1/⁄4 k-2 ... Time 
2 64/4 cy cil cl4 cl Cck4/A.— cuu] A Cea lA cquj/A ... Immediate 
payments 


Figure 7.1 Due and immediate quarterly annuity payments 


7.4 Immediate annuities 


We now discuss an important variation. Refer back to (7.1). Suppose that instead of this 
scheme we assume payments of 


Ck . J Ji 
— attimek+ =, j=1,2,...,m. (7.10) 
m m 


We can view this as having payments made at the end rather than the beginning of each period 
of length 1/m. Annuities satisfying (7.10) are often referred to as immediate, while those 
satisfying (7.1) are referred to as due. The present value of the annuity with benefit vector 
given by (7.10) and discount function y is denoted by a™ (c; y). That is, the two dots are 
removed for immediate annuities. 

The reader may wish to refer back to Section 2.14.1 where we commented on the somewhat 
unusual terminology. We also pointed out that the distinction between the two types is not 
needed in the annual case. However, when we postulate a constant rate of payment over the 
year, we do need to make the distinction in the mthly case. See Figure 7.1 for a comparison 
of the payments on a due and immediate quarterly annuity. 

To go from the due annuity to the immediate annuity, we must do the following. At time 
zero, subtract a payment of co/m. At time 1, subtract a payment of c; /m and add a payment 
of co/m. At time 2, subtract a payment of c,/m and add a payment of c; /m, etc. We then 
have, for any discount function y, 


de; y) = d" (e; y) — Lale»). (7.11) 


Example 7.4 Find a formula for the present value of a life annuity on (40) consisting of 
payments of $500 at the end of each month for 10 years. Assume a constant interest rate. 


Solution. In this case, 
c = 6000(1 9), Ac = 6000(1, 05, —1), ü49(Ac) = 6000[1 — y495(10)]. 


The present value is 


. 1 
6000 DU — 6000 | o) -ud- xao (10)) 


: 1 
— 6000 [a1 2g o) = (612) 5) (1- vao(10)) l 
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7.5 Approximation and computation 


In the previous section we developed some basic formulas. We now concentrate on the problem 
of obtaining numerical values. 

For this purpose it is convenient to develop a closed-form formula for f(m) in terms of 
standard interest symbols i”) and d™. Assume for the moment constant interest. From the 
definition of f(m) we can write, using the discount function v(k) = (1 + i')-*, 


1 ; 
Alm) = — Val, ,). 
m 
where j = (1,2, ..., m — 1). From formulas (2.16) and (2.17), 


[A 4 i)"-! — 1]/d' — (m — 1) 


Val, 4) = (0 + i)" Pag) = T 


We can simplify the numerator since 


Qxjy--1 Qü-P/P"-ü-P i 
NC ge oe eS LS 


=]; 


Substituting and noting that mi! = i”, md’ = d", we obtain the formula 


i- i” 


BOR) = e. 


(7.12) 
It then follows that 


d id 
STD gags COD) a an 
For varying interest we get the same relationships holding on a year-by-year basis. We 
need merely insert subscripts of k on all quantities. 
Exact calculations of a(m) and f(m) are not always done in practice. It is common to use 
an approximation that assumes a constant interest rate of 0. This implies that 


Bins 1+2+--+m—1 -m-l 


m2 2m 


Moreover d/d” = | and a(m) = 1, so we can write the approximate formula 


d" c) = à, (e) - La, (Ao). (7.13) 
2m 


In much of the actuarial literature this approximation is not found by setting the interest 
rate equal to O as we have done, but rather by using linear interpolation for the discount 
function y,. That is, it is assumed that for all nonnegative integers k and 0 < s < 1, 


Xy G) = CL = 8) Sy ua). 
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Under this assumption 
ame) = [rey (E) (2) (0) 
m m m m 


2m 2m 


m—1 
Se ge 


which is the same as al” (e°) calculated at 0 interest. By (7.2) both assumptions lead to the 
same approximation. This reason why this occurs is that the linearity of y,, in addition to 
the UDD assumption, necessarily implies that the interest rate must be 0. The advantage of 
stating the latter condition from the outset is that it makes it clear that the approximation 
should be reasonable with relatively low interest rates. However, if rates are too large relative 
to mortality rates, this approximation can lead to inconsistencies. 


Example 7.5 Suppose that i = 0.21, and q, = 0.004. Calculate ä?(e?; v) and à? (e^), using 
(7.13) for the latter. 


Solution. 
ae; v) = sll + 1.107!] = 0.955. 


Using (7.13), 


ZONE 1 | 0.228] " 

üá^(e)21——-|1—- ——|-20)956. 

x ©) 4 1.21 

It is, however, inconsistent for an annuity in which payments are contingent upon survival to 
cost more than an annuity with identical payments that are certain to be made. 


Equation 7.13 is part of a series of other possible approximation methods known as 
Woolhouse's formulas. This one is the two-term formula. The next in the series, the three- 
term formula, is not used often but appears to give very good results in certain cases. After 
developing the appropriate tools in the next chapter, we will investigate further the two-term 
formula inconsistency and derive the three-term formula. See Section 8.11. 


"7.6 Fractional period premiums and reserves 


Annual premium and reserve formulas remain basically unchanged when premiums are paid 
mthly. The only difference is that we replace à, with a” throughout. The purpose of this 
section is to derive a few useful identities. 

Consider all policies with a particular issue age (x) and with level premiums payable for 
h years. The simplest type of such a policy will be a 1-unit, h-year term policy with annual 
premiums. Let P denote the net annual premium and , V denote the reserve at time k for this 
term policy. Given any other such policy, let P denote the net annual premium if paid yearly, 
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and let P"? denote the net annual premium for this policy if paid mthly (i.e., P"? /m is paid 
at the beginning of each mth of the year). Let , V denote the reserve at time k on this policy if 
premiums are paid annually and let , V denote the reserve at time k if premiums are paid 
mthly. For simplicity, we will assume constant interest. 

We have 


ponam yes Pay), 


as both sides are equal to the present value of the benefits, so that 


IE N aa, e Bi le 
po ap) a” E 
using (7.8) for the last equality, and we can write 
P d 


Since the value of insurance benefits is independent of the premium frequency, we have, 
fork « h, 


VO = V = Pà, 0, ) — POA. i) 


x p (m) 
= po» | xo dead 7 asp « 


Now substitute from (7.8) for aa n) and from (7.14) for P/P™. The term (d/d) 
à, (15. 4) cancels out and we are left with 


,VO? —, V = g(m)P", v. (7.15) 


The term on the right represents the additional reserve that must be held at time k in order to 
provide for the premiums that will not be collected in the year of death. 

The significance of (7.14) and (7.15) is as follows. We consider a collection of policies 
with level premiums, common issue age and premium payment period. The benefits can be 
anything at all. We know the premium and reserve for the particular case of the level term 
policy. Then, using only those two quantities, we can easily adjust premiums and reserves 
from annual mode to mthly mode, for all other policies in the collection. 


7.7 Reserves at fractional durations 


Insurance companies do a complete calculation of reserves on December 31 (or whenever their 
particular fiscal year ends). However, policyholders are not always accommodating enough 
to purchase their policy on January 1. This means that most of the policy durations at the 
calendar year end will be fractional. In this section we derive formulas for calculating reserves 
at time k + s where k is an integer and 0 « s < 1. Consider the general annual premium policy 
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with death benefit vector b and premium vector a. Suppose first that the death benefit for 
those dying between time k and k + s is paid at time k + s. The reserve at time k + s would 
then be, by a calculation analogous to that used in deriving (6.7), 


GV + apy, (k +5, k) — b Set, 


sPx+k 


That is, thinking retrospectively, the balance at time k + s is obtained by taking the balance at 
time k, adding the premium, accumulating up to time k + s at interest and survivorship, and 
then subtracting from each survivor enough to pay the death benefits for those dying between 
time k and time k + s. 

We must, however, adjust this to reflect the fact that the death benefit will not be paid until 
the end of the year, and to do so we multiply the second term by v(k + s, k + 1). Note now 
that 


vik+sk+1) —xG-sERD — ydk+s,k+1) 


sPx+k E (i-sPx+k+s) (sPr+k) i Px+k 


From the UDD assumption, .4,,; = s(q,,,) and substituting these last two identities gives 
the desired reserve formula as 


dx 


X4 


-k 
k+s V = GV + a) (kK + s, k) — sb, y(k+s,k+1) 
-k 


-q 9 V yos +s | GV emo 31,8. b, 2 


y,(k t s, k + 1). 
Px+k 


Finally, substituting from (6.7) yields 
kas V = (1 — S)EGV + my (K+ s, k)] + slk V y,(k + s k- D]. 


In other words, under UDD, the reserve at some intermediate point in the year is obtained by 
linearly interpolating between the initial reserve, accumulated to that point with interest and 
survivorship, and the end-of-year terminal reserve, discounted to that point with interest and 
survivorship. Care must be taken not to use the terminal instead of the initial reserve at time 
k, as that would involve interpolating over a discontinuity. 

A common simplification is to assume that interest and mortality rates are sufficiently 
small so that we can take y, as identically equal to 1. This results in the approximate formula 


ias V = (1-9GV tz) + suu V) 
= (1 = s)V + SQV) + (1 = S). 


The third term above is known as the unearned premium. It is simply the total premium paid 
at the beginning of the year multiplied by the fraction of the year remaining. 

When premiums are paid m times a year rather than annually, the same type of approxi- 
mation as above is normally used. That is, we calculate the reserve at time k + s by linearly 
interpolating between the terminal reserves at time k and k + 1 and add the unearned premium. 
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To be precise, suppose that s = h/m + r where h is an integer and 0 < r < 1/m and that the 
annual premium zr; is paid m times per year. Then a commonly used approximation is 


1 
is V = (1—5)V + su V) (— z r) 7. 


Example 7.6 For a certain policy ¿V = 108, 4V = 180 and for the fourth year of the policy 
the premium is 60, paid quarterly (i.e.,15 is paid at time 3, 3i. 35 and 3 T Find the reserve at 
time 3 years and 7 months. 


Solution. We have s = 7/12 = 2/4 + 1/12. The unearned premium is ( 


above formula gives 


1 1 _ 
+ - 5) 60 = 10. The 


5 7 
nV = 75 X 108 + 1X 180 + 10 = 160. 


An alternate way of calculating the unearned premium is to note that the reserve is taken 
at a time one-third through the quarter-year period, so the unearned premium is two-thirds of 
the premium payable at the beginning of the quarter, which is Í x 15 = 10. 


7.8 Standard notation and terminology 


Most of the standard notation for fractional payments has been discussed already. The one 
remaining symbol is A"? which is used to denote net single premiums for an insurance in 
which the death payments are paid at the end of the mth of the year in which death occurs. 
For example if m — 12, the death benefit would be made at the end of the month of death. 

The assumption of Exercise 7.14 below is known as the Balducci hypothesis, named for 
G. Balducci, an Italian actuary. It arose in conjunction with mortality studies designed to 
produce life tables. 


Exercises 


Assume UDD unless otherwise indicated. 


Type A exercises 


7.l You are given that dq = 12, ;9p5q = 0.8, and interest is constant at 6%. 
(a) Find the net single premium for an annuity on (50) with level benefit payments of 
1000 per month, beginning at age 50 and continuing for life, with the provision 


that the first 120 monthly payments are guaranteed, regardless of whether (50) is 
alive or not. 


(b) Redo (a), assuming now that the first payment is made at the end of 1 month. 


110 


7.2 


7.3 


7.4 


AD 


7.6 


7.1 


7.8 
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You are given that q7ọ = 0.2, q;, = 0.25, q5; = 0.30. Interest rates are 20% for the 
first 2 years and 30% thereafter. A 3-year life annuity on (70) provides for benefits of 
1000 in the first year, 2000 in the second year and 3000 in the third year, provided that 
(70) is then alive. Suppose that payments are to be made quarterly. Find the present 
value if: (a) the first payment is to be made at age 70; (b) the first payment is to be 
made at age 704. 

An annuity on (50) provides for yearly payments for 30 years. The amount of the 
income is 10 000 for the first 10 years, 8000 for the next 10 years and 5000 for the 
last 10 years. The net single premium for this annuity is 100 000. You are given that 
the interest rate is a constant 5% and that ;9p5o = 0.9, 20P50 = 0.8, 39P59 = 0.7. Find 
the premium if the same income is to be paid monthly instead of annually, assuming 
payments are made (a) at the beginning of each month, (b) at the end of each month. 
You are given the following figures from a life table. 76g = 1000, 76; = 700, Z6, = 
500. Find the probability that a person now age 602 will die between the ages of 607 


3 
and 617. 


Given q7 = 0.2, q;, = 0.3, G72 = 0.4, find the probability that a person age 707 will 
live to age 72. 

Instead of assuming UDD, suppose that, for an integer x and 0 < t < 1, we assume 
that ,p, = p. Given lęọ = 100000, /;, = 81000, lę2 = 41472, find the probability 
that (607) will die between the ages of 61i and 61 i 


A policy is issued on March 20, 1990. Given that ,gV = 2000, ;,V = 3000 and that 
the premium is a level 105 per month, find the reserve on December 31, 2000, using 
the standard approximation. Assume that each month is 30 days. 

For a policy on (x), 5V = 220, &V = 80, bs = 1000, i; = 0.10, q,,4 = 0.2. Find 
541/4V- 


Type B exercises 


7.9 


7.10 
7.11 


*].12 


Let r be the present value of a life annuity on (x) providing 1 at the beginning of each 
period of length 1 /m, with payments guaranteed for n years. Let s be the present value 
of the same annuity but with the payments at the end of each period. If v(n) = 0.9 and 
nPx = 9.6, find r — s. You are not given the value of m. 


Show that under a constant interest rate of i, a"? (1 w= G/ i™)a(1 2)- 


You want to evaluate aC (k) at a constant interest rate of 20%, where k = 
(10,9,8, ..., 1). Assuming that UDD holds, what error is made if you use the zero- 
interest approximation? You are given that @,(k) = 35 and à,(1,,) = 6. 


A 20-year term insurance policy on (50) has a constant death benefit of 1000, and 
net level annual premiums of 10, payable for 20 years. The reserve at time 15 is 60. 
Another policy on (50) has the same premium pattern but different death benefits. On 
this latter policy, the annual premium is 100, and ;5V = 500. The interest rate is a 


EXERCISES 111 


constant 6%. What is the value of the annual premium and the reserve at time 15 on 
this latter policy, if the premiums are payable monthly instead of annually? 


Suppose we assume that for all integers x and 0 < t < 1, 


fry = tye! 


x+1° 
(a) Show that this is equivalent to the assumption of Exercise 7.6. 


(b) Show that if (2.9) holds for v, then it also holds for the discount function y, 
whenever 5, t, 5 4- h, t -- h are all in the interval [k,k + 1) for some nonnegative 
integer k. (See Exercise 8.21 for more on this assumption.) 


Suppose we assume that for all integers x and O < t < 1, 


-1 =j -1 
£4 —(0-Dé +0 


x xl 


Show that for all integers x and 0 < t < 1,4 ,4,,, = (1 — Day. 


Assume UDD. Suppose that, a life table is such that q, < q,,, for all nonnegative 
integers x. 


(a) Show that for any t > 0 and integers x < y we have ,q, < 4. 


(c) Show that under the assumption of Exercise 7.6, it is true that for all t > 0 and 
positive numbers x, y, we have ,q, € ,qy. 


Spreadsheet exercise 


7.16 


Modify the Chapter 4 spreadsheet so that by entering the frequency m, the spreadsheet 
will calculate both due and immediate annuities payable m times per year. 


Continuous payments 


8.1 Introduction to continuous annuities 


In this section we consider annuities where payments are made continuously. Naturally, this 
is not physically possible, but we can picture these as a limiting case of mthly annuities as 
m goes to infinity. Suppose, for example, that you are to receive a total of 36 500 units each 
year. This could be done by paying 100 units per day, or 4 1/6 units per hour, or 0.001157 
units per second and so on. If you can imagine payments coming in every nanosecond, you 
may get some feeling for what a continuous annuity would be like. 

This may seem as a somewhat artificial concept, but there are many uses for continu- 
ous annuities. They can be used to approximate mthly annuity values for large values of m. 
Moreover, we will show that insurance contracts with the realistic provision of benefit pay- 
ments at the moment of death, can be viewed as continuous annuities. For insurance contracts 
purchased by continuous premium payments we can derive some interesting mathematical 
relationships that are analogues of those that appeared in Chapters 5 and 6. 

In the continuous case, we cannot specify an actual payment at any point of time. As shown 
by the figures above, this approaches zero as the frequency of payment increases. Instead we 
must speak of the periodic rate of payment. Consider a monthly annuity. If the payment in | 
month is 100, we could describe this by saying that the annual rate of payment for that month 
is 1200. This would mean that if the payments remained at the same monthly level for a year, 
then the total payment for that year would be 1200. Each of the annuities described above 
would be paid at the annual rate of 36 500. Of course in an annuity paid m times per year, the 
actual payment and therefore the periodic rate of payment could change every mth period. In 
our continuous annuity, it could change from moment to moment. (Picture a person standing 
under a chute, receiving money that is flowing down continuously. The speed at which it 
comes down could change at each instant.) 

In place of a cash flow vector we now have a cash flow function c defined on [0, N], where 
c(t) is the periodic rate of payment at time f. 
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We will assume that all our cash flow functions are piecewise continuous. That is, there is 
a partition, 0 = a, < a; < = < a, = N, such that c is continuous on the intervals (a; 4, à;), 
i=1,2,...,.n-1. 

Suppose we have an arbitrary discount function y and a continuous annuity with cash flow 
function c. The present value of the annuity will be denoted by a(c; y) or just a(c). (The ‘bar’ is 
a standard actuarial notational device to denote continuous as opposed to discrete quantities.) 
Consider an approximating annuity with payments made mthly. That is, payments are made 
at time j/m where j is an integer between 0 and mN — 1. We assume that N is an integer and 
the annual rate of payment at time j/m is c(j/m). The actual payment at time j/m is therefore 
(1/m)c(j/m), so the present value of this mthly annuity is 


y(j/m). (8.1) 
Taking the limit as m goes to oo gives one of the key formulas of this chapter, 


N 
a(c; y) = | c(r)y()dr, (8.2)i 


since the mthly present values are Riemann sums for this integral. 


8.2 The force of discount 


Throughout the book we use ‘log’ to refer to the natural logarithm (often denoted by In or log,.) 

The evaluation of continuous streams of cash flows leads naturally to a new function, 
which we will spend some time investigating before returning to the main topic. It can be 
written in several equivalent forms. 


Definition 8.1 Given an arbitrary discount function y, define another function, 6,(¢), which 
we will call the force of discount associated with y, by 


d 


ó,(f) = lim ——— ————— = lim ————— ———— = 
) h=>0 h h0 hy(t, 0) y(t, 0) 
: (8.3) 
d d po 
= — | t,0)=-—1 =— . 
ET og y(t, 0) di og y(t) y 


(To derive the second last equality, recall that y(t) = y(0, t) = y(t, 0)! .) 


From the first equality in (8.3) we see that ô, represents a relative rate of growth. Each 
unit invested at time ¢ will accumulate to y(t + h, f) units over the next h time periods, so the 
numerator gives us the relative growth rate over this time interval. We divide by A to get the 
relative rate of growth per period, and take limits to get the instantaneous periodic relative 
rate of growth at time f. 

We are often given ô, and wish to recover y. 
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Proposition 8.1 
y(t) = e Jo à dr. 


Proof. From the second last expression in (8.3), we integrate to get 


t t 
/ ó,(r)dr = -f < [log y(r)|dr = — log y(t) + log y(0). 
0 o Or 


We use the fundamental theorem of calculus for the second equality. Note that log y(0) = 
log 1 = 0, multiply by —1, and take exponentials to complete the proof. IN 


An immediate consequence of Proposition 8.1 is that for s < t, 


t-s 


y. Ð = y(/y(s) = e- Le dr = e- fo tdr, (8.4) 


Remark This traditional use of the force of discount is for the investment discount function 
v. In this case 6,(¢) will be denoted by just ô(t). The standard actuarial notation is 6,, but we 
find it more convenient to use the bracket in place of a subscript to parallel our treatment for 
the symbol v. This quantity is commonly referred to as the force of interest. The terminology is 
justified since for an instantaneous period there is no difference between interest and discount. 
(This is shown precisely by the equality of the fourth and last expressions in (8.3)). 


8.3 The constant interest case 


Consider the constant interest discount function given by v(t) = v’. Then 6(f) is also constant 
and denoted just by 6. Since log(v’) = tlog(v), the second last expression in (8.3) shows that 


ô = —log(v) = log(1 + i), (8.5) 
and Proposition 8.1 gives 
v(t) = e”, (1 4 iy =e”, (8.6) 


The second expression reflects the fact that under constant interest, invested capital grows 
exponentially and ó is just the usual rate of exponential growth. 

Additional insight may be gained by comparing ó with the fractional interest quantities of 
Chapter 7. We have that 


ô= lim i? = lim d”. 
m> m-oco 


This can be seen by writing 


(m Cg qo = 
1/m i l/m ' 
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so that 
lim i” = PU 9zs eO 5 logi Hl 5. 
m-oo 


The calculation for d is similar. 

The quantity ó is often thought of as an instantaneous rate of interest, but it is important 
to note that just like the rates /"? and d™, its value depends on the underlying period. The 
transformation for change of period though is much easier than for the interest rates, as it just 
involves a proportional change. For example, if the annual force of interest is 0.06, then the 
force of interest for a half year period will be simply 0.03. 

We now return to continuous annuities, using a constant interest rate, and show that in 
certain cases we get expressions analogous to discrete annuities, but with ó replacing d. We 
will derive the continuous versions of (2.16) and (2.17). Define a function 1, by 


l, dfrem 
tn) = ie if t» n. 


This is the same notation as for the analogous vector, but the meaning should be clear from 
the context. We also define the function 7, (f) by 


t, ift<n, 
1,0) = { 0, ift»n. 


From (8.2) and (8.6), 


- n ]- e n 
a(1,) = / edt = (8.7) 
0 6 
Integrating by parts, 
n —ôt |” n a(1.) — ne ó^ 
a(I,) = T edt =E] +1 f E uro ne (8.8) 
0 ô lo 6 Jo 6 


8.4 Continuous life annuities 


8.4.1 Basic definition 


The present value of an annuity issued to (x) that provides benefits paid continuously at the 
annual rate of c(t) at time t, provided (x) is alive, will be denoted by a,(c). Formula (8.2) gives 
the exact formula 


a,(c) = f c(t)w(t),p,dt. (8.9) 


Note that this formula follows essentially the same pattern that we observed after formula 
(5.1). Instead of summing we integrate, and for each t the integrand is the product of three 
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factors: the interest discount factor; the ‘amount paid ’ at time f, (which we can intuitively 
think of as c(f)dt); and the probability that the payment is made. 


8.4.2 Evaluation 


We will generally only have available the values of Z, for integer values of (x), and must 
interpolate for the remaining values in order to evaluate the integral. As in Chapter 7, we will 
assume UDD, which will allow us to express continuous annuities in terms of annual annuities. 
The procedure closely parallels that for mthly annuities and we could simply write down the 
formulas as limiting cases of those in Chapter 7. It is instructive however to proceed a new. 

For simplicity of notation, we will at first assume constant interest. The initial step is to 
define the continuous analogues of the functions f and a introduced in Chapter 7. Let 


6 . 
b= _e-1-6_i-6 
A= s AU age 8i 


invoking (8.7) and (8.8). Let 
= di 
It is clear that f(co) and a(co) are the respective limits of f(m) and a(m) as m approaches oo. 
We now use the same procedure as we used in Section 7.1 and replace all the payments in 
any 1 year by a single payment, equal to the value of these payments at the beginning of the 


year. This replacement principle holds in the continuous case as well. For a formal verification 
use the standard integration result 


b a b 
i E +f fora < b. 
0 0 a 


Analogously to (7.2) this leads to 
a.c) = G,(2), (8.10) 
where 
1 1 
Ze = / elk + s)v° p, ds = i c(k + s)v°(1 — 5+ dyay)ds. (8.11) 
0 0 
This is easily evaluated when the rate of payment is constant over each year. For a 


nonnegative integer k, let c, denote the constant value of c(k + s),0 € s < 1. We can factor c, 
outside of the integral sign in (8.11), which gives 


= = d 
zy = CLA 1) — q,,4401)] = Ck E — BO) yk - 
Then, from (8.10), 


Z,(0) = Fal) - P(A), 
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where € = (co, c4, ...). Finally, from Theorem 5.1, 
a,(c) = a(co)a,(c) — B(co)a,(Ac). (8.12) 


As promised, this reduces the calculation to finding the present value of yearly annuities. 
As in the mthly case, it is usual in practice to approximate this by calculating the adjusting 
factors at zero interest. We then have a(co) = 1, B(oo) = 5» and the approximating formula is 


a,(c) = a,(e) — 5, (Ac). 


In the case where c = 1, so that Ac = (1,0,0, ...), we get the familiar equation 


is ies 1 
a, =a,— x. (8.13) 
2 
This concludes the development of formulas for the constant interest case. When rates are 
only constant over each year, but can vary from year to year, we must replace the constants 
d/6,a(co) and B(co) by vectors, analogously to the treatment in the discrete case. 


8.4.3 Life expectancy revisited 


Comparing the formula for life expectancy (3.8) with formula (4.1), we see that e, is simply 
the present value of a life annuity of 1 per year, beginning at time 1, at a constant interest rate 
of 0. This make sense intuitively, since at zero rate of interest, the present value of the annuity 
will simply be the total amount paid, which in turn is just the whole number of years lived, 
which on average is just e,. We can now write down a more rigorous formula for the complete 
life expectancy, as the present value of a life annuity paying 1 per year for life continuously, 


at zero interest. This gives 
oo 
o 
e, = f Pdt. 
0 


Assuming UDD, we apply the formulas given above. With c = 1,, ,, and an interest rate 
of zero, 


a,=1+e,, a,(Ac) = 1, 
and from (8.13), 


o 1 1 
e, shee cde 
recovering the approximation we obtained intuitively in Chapter 3. 
Similarly, the complete n-year temporary life expectancy is given by d. ip dt, which under 
UDD, is equal to Asi KP + snd as we obtained previously. 
Here is a interesting use for life expectancy. Suppose you are given e, and the investment 
discount function v, but not the actual life table, and you want to approximate a,. One natural 
way to do it is simply take a(1; ; v) for this approximation. Indeed, some might even think that 
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this (or an appropriate modification in the discrete case) should be exact, reasoning that the 
cost of a life annuity on a person should be the cost of providing the individual with income 
up to their precise life expectancy. The approximation should be close, but in the normal case 
of positive interest rates, this method always overstates the true value. A mathematical proof 
is outlined in Exercise 22.15 but we can easily explain this fact by general reasoning. Suppose 
an insurer sells 1-unit continuous life annuities to a group of people age x and charges them 
each the price of a fixed period annuity with the period equal to their life expectancy. The 
insurer can expect to gain money on those who die before living their life expectancy, but lose 
money on those who live longer, and these gains and losses should cancel out. However, since 
the gains come earlier, they will produce extra returns due to interest earnings and the insurer 
will end up with more than they need to provide the benefits. The break even premium will 
then be somewhat less than the premium for the fixed period annuity. 


8.5 The force of mortality 


Before proceeding with insurances, we introduce a new quantity that is analogous to the force 
of discount. 

We first motivate this intuitively. What is the probability that (x) will die in the next instant 
of time? We could approximate this by looking at ,q,, but if we assume continuity and take 
the limit as ^ approaches 0, we would get 0. The question is answerable but the answer of 0 is 
uninteresting and gives us no information about the mortality of (x) at that point. Instead, we 
can compute an annual rate of mortality at age (x), by dividing by h before taking the limit. 
This leads to the following definition. 


Definition 8.2 The force of mortality at age (x) is the quantity 


d 
qna &x 004. É.— C eh = mz |. d 
Rime Sa T 


x x 


The expression after the third equality is useful for obtaining additional insight. We have 
a group of lives declining over time due to mortality. The quantity u(x) gives us the relative 
rate of decline in this group at age x. 

In many cases we are looking at a fixed age x and we want the variable to be the time t. 
We will accordingly define 


p(t) = Watt). 


We can view j4(t) as the force of mortality at time ¢ for an individual age x at time 0. From 
the fourth expression in (8.14), 


d 
u(t) = atr (8.15) 


tF x 


and we imitate the proof of Proposition 8.1 to derive 


p= Jo Hdr, (8.16)} 
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Example 8.1 Suppose that the force of mortality is given by 


M(x) = l , X<. 
@-—x 


Find an expression for ,p,. 


Solution. For t « o — x, 


t 
-f — ar = logo — x — f) - log(o - x) = log (222), 
9 @-x-r 


Q—x 


We then substitute in (8.16) to get 


The following quantity is often useful. 


Definition 8.3 Let A,(t) denote the force of discount for the interest and survivorship function 
at age x 


Note that 
A, () = ——logy,(t) = —— log v(t) — d log p, = (t) + ut) (8.17) 
2 dt Yx dt dt ae dm ` 


In other words, the force of discount for interest and survivorship is just the sum of the forces 
for these two components. We can then view the function y, as the particular case of ó, when 
interest rates are zero. 


8.6 Insurances payable at the moment of death 


8.6.1 Basic definitions 


Unlike annuities, the continuous version is the realistic model for insurance policies. It captures 
the idea that, in practice, claims are paid at the moment of death rather than the end of the 
year of death. We can also allow for the death benefit to vary continuously with time. In place 
of our death benefit vector, we have a death benefit function. This is a piecewise continuous 
function b defined on [0, œ — x) where b(t) is the payment made at time ¢ should death occur 
at that time. The net single premium for such a policy on (x) will be denoted by A,(b). To 
calculate this, we will consider an approximating policy. Let m be a positive integer. If death 
occurs between time j/m and (j + 1)/m, where j = 0, 1, ...,m(@ — x) — 1, our approximating 
policy will pay b(j/m) at time (j + 1)/m. Up to now we have looked at the case of m = 1. If 
m = 365, for example, it would mean that death benefits were paid at midnight on the day of 
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death, clearly getting close to what we want. If we let A” (b) denote the net single premium 
for this approximating policy, we will have 


A,(b) = lim A“ (b). 
mo 
Reasoning as in the middle expression of formula (5.1), we have 


AY” (b) = b(0)v(1/m) (1 = 1 mPy) + BA /m)v(2/m) (1 /mPx — 2/mPx) +- 


Taking limits as m goes to oo, 


A,(b) = — / b(t)v(t)d,p, = — i bv) pdt 
0 0 t 


and from (8.15) we obtain our final formula 


A,(b) = J E (8.18)4 
0 


Once again we have the familiar three-factor product; the amount paid; the interest factor; 
and the probability that payment is made. We can write the latter as being itself the product of 
two terms: ,p,, the probability of living to time t; and ui, (r)dt, which we can intuitively think 
of as the probability of dying at exact moment t. 

As well, this formula illustrates our statement above that insurances payable at the moment 
of death can be viewed as continuous annuities. Analogously to (5.5), 


A,(b) =4,(c), where c(t) = b(t), (1). (8.19) 


8.6.2 Evaluation 


As with continuous annuities, we normally will have only the life table available and will use 
UDD in order to evaluate the integral in (8.18). We again will simplify the notation by first 
using constant interest. The most useful consequence of UDD for this purpose is 


d d 
Pyh) = — Py = — 4 T tay) = dy (8.20) 


where y is an integer and 0 < t < 1. 
As in the annuity case, we have 


A,(b) = a,(2), (8.21) 


where z, is the value at time k of the benefits paid in the year k to k + 1. Substituting from 
(8.20) 


1 1 
y= J VDK+DDratHyrp(Ddt = vq, J (1+i!b(k + t)dt. 
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This leads to 
" " " 1 
A,(b) = A,(b), where b, = | (14 D^ b(k + dt, (8.22) 
0 


thereby reducing the evaluation of moment of death insurances to insurances paid at the end 
of the year of death. 

This is further simplified in the case where the benefit function b is constant over each 
year. If b, is the constant value of b(k + t) where k is an integer and 0 < f < 1, then b, factors 
out of the integral in (8.22) and 


1 f 
b, = n f (1 +50 dt = p, 5, 
o D 


leading to the very simple formula 


A,(b) = <A,(b), (8.23)$ 
where b = (bo, by, ...). 
Additional insight can be obtained by observing that 
i e-1 6 i 
-= = 1 +-= 1 T- 
6 6 2 2 


(Here, = stands for ‘approximately equal to’.) This reflects the fact that in the case where 
benefits are constant over the year, the difference between paying benefits at the end of the 
year of death and at the moment of death is that in the latter case the insurer will not earn 
interest from the time of death to the end of the year, and the premium must be higher to 
account for this. This time interval will vary with time of death, ranging from 0 for those 
dying at the end of the year, to 1 year for those dying at the beginning. On average it should 
be half a year. 

In the case where interest is only constant over each year, we replace the right hand side 
above in (8.23) by A,(i/6 * b) where now i/6 is a vector with entries of i, /6,. 


Example 8.2 An insurance contract on (x) provides for 2, paid at the moment of death if 
this occurs within n years, plus a pure endowment of 3 if x is alive at the end of n years. Find 
the present value given that interest rates are a constant 6%, v", p. = 0.4 and A,(1,) = 0.2. 


Solution. Care must be taken to apply the adjustment factor only to the term insurance, and 
not to the pure endowment, which is always paid at time n. At 696, we have i/ó — 1.0297 so 
the present value is equal to 2(0.2)(1.0297) + 3(0.4) = 1.612. 


When benefit payments are not constant over each year, we have to resort directly to (8.22) 
and try to evaluate the integral. 


Example 8.3 An insurance policy on (x) provides for a death benefit of t paid at the moment 
of death, should this occur at time t. Assuming UDD, find a formula for the present value in 
terms of end-of-the-year-of-death insurances. 
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Solution. For an integer k and 0 < ft < 1, we have the death benefit bk + t) =k+t=(k+ 
1) — (1 — 2). We can think of this as two policies, one paying k + 1 for death between time k and 
time k + 1, and the other paying —(1 — f) for death at time k + t, where k = 0, 1, .... (Negative 
death benefits may be unrealistic in practice, but they are fine from a mathematical viewpoint.) 
From (8.23), the present value of the first policy is (//6)A,(j), where j = (1,2,3...). From 
(8.22) the present value of the second policy is the same as that with a constant end of the 
year death benefit of 


- ['ass7a-»a- - (4-4) =- (4-2) 


after integration by parts. The total premium is therefore 


ios- 6-24] 


It may appear more natural to have taken the split of the death benefit as k plus t. The 
split above was done to get an expression involving the death benefit vector j, representing 
a linearly increasing benefit starting at 1 unit. This is often encountered in practice. The 
alternative method would result in a policy based on a vector with initial entry 0, which would 
not normally be found in a real-life policy. 


In a more complicated case the integration could be difficult to carry out, necessitating 
the use of an approximate integration formula. This can arise in the case of death benefits in 
pension plans that may depend on salary and length of service and could vary continuously 
over each year. A frequently used, very simple rule, is simply to assume that the death 
benefit for the year from time k to k+ 1 is constant at the value which it takes at time 
k+ 1/2. 


8.7 Premiums and reserves 


The realistic model for life insurance calls for death benefits to be paid at the moment of 
death and premiums to be paid periodically (yearly, monthly, quarterly, etc.). This means that 
we have to treat death benefits separately when computing premiums and reserves, so we 
cannot conveniently calculate the net cash flow vector f as we did in Section 6.1. However, 
this usually causes little difficulty as we can simply change A, (b) to A,(D) in all premium and 
reserve formulas. 

We also must define the shifted functions b o k analogous to that for vectors. That is 


(bok 2b(k-t, OztxXo-x-k-t. 


Example 8.4 Assume a constant interest rate. Consider a policy on (40) with death benefit 
function b (constant over each year). The total premium for the year k to k + 1 is z}, payable 
monthly. The interest rate is a constant 0.06. Write a formula for the reserve at time 10. 
Assume UDD. 
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Solution. We have 
= (12 
10V = [Aso(b o 10) — aC? (x o 10)], 
where z = (ao, m4, ...). For i = 0.06, 


: = 1.0297,  a-1.00028, f= 0.46810, 


and we can write 
10V = 1.0297A5o(b o 10) — 1.000 28 ásg(z o 10) + 0.468 10 so (A(x o 10)), 
with b = (bo, b, ...), where b, is the constant value of b(t) for k <t<k+1. 


In order to avoid this dual treatment, it is sometimes convenient to consider a policy where 
premiums are payable continuously. This is unrealistic, but a reasonable approximation in the 
case of monthly premiums. It often results in simplifying the mathematics. We will look at 
cases where this is useful in the next two sections. 


8.8 The general insurance-annuity identity in the 


continuous case 
In this section we develop a connection between moment-of-death insurances and continuous 
annuities, which is analogous to that obtained in Section 5.7. We now assume that the 
associated benefit function b is differentiable except possibly at the endpoints of an interval. 
That is, we suppose there exist points r and s such that b(t) is equal to 0 for 0 « t « r and 


s < t< o — x, and that the derivative b’ (t) exists for r < t < s. 
Then 


A,(b) = f “OVO Pyy(OAt = — / KODS p, dr 


Integration by parts gives 


b 


A,(b) = b(t)v(),p,|; + li [b (0v(r) — b)v(&()]p, dr. 
Simplifying, 
A, (b) = DWE), Py — bG)vG),p, + a, (^) — a,b). (8.24) 
Here ôb is just the function defined by ób(r) = ó(t)b(t). 


This is in fact a continuous analogue of (5.8). We need the two additional terms at the 
beginning to handle the possible jumps in the function b. 
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We can use this to derive a continuous version of the endowment identity of Section 5.7.2. 
Assume a constant force of interest of ô. For a l-unit, n-year endowment insurance with 
benefits paid at the moment of death, Bos denotes the net single premium, and Po denotes 
the rate of annual premium, payable continuously for n years. In this case the function b is 
constant on the interval (0, n), so b’ = 0. Taking b = (1,,), the first two terms in (8.24) are 


1 — v^, p,, which leads to 
Ayn = A04) vp, = 1— 64,(1,), 


and 


8.9 Differential equations for reserves 


The goal of this section is to develop a differential equation for reserves, which is analogous 
to the recursion formulas of Sections 2.10.4 and 6.3. 

Suppose we have a transaction with continuous cash flows defined by the cash flow 
function c defined on [0, N], and a general discount function y. The reserve at time t, which 
we will denote by Vc; y), is defined exactly as in the discrete case, namely as the negative of 
the value at time f of all future cash flows. So 


— N-t 
,Veiy) = -f c(t +r)y(t,t+ r)dr. 
0 


It is convenient to make a change of variable from r to s = r + t and write 


T N N 
Ve = — / c(s)yt, s)ds = —y(t, 0) J c(s)y(s)ds. 
t t 


Differentiating by the product rule, using (8.3) to differentiate the first factor in the last 
expression, and the fundamental theorem of calculus to differentiate the second factor, gives 


C y) = &(0),V(ciy) + cC), (8.25) 


a continuous analogue of (2.26). 

Consider now an insurance policy on (x) with death benefits of b(t) paid at the moment of 
death, and premiums paid continuously at the rate of z(t) at time t. Let c(t) = z(t) — b(t)u(t), 
the annual rate of continuous cash flow. The reserve at time f on this policy is just -,V(c; Vx) 
which we will denote by just V. Substituting in (8.25), and using (8.17), 


V = VO + 8A) + z(0) — bus), 
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or 
SV = 8G)" x(0 = MOO- ,V). 


These last two formulas are the continuous versions of (6.7) and (6.8), respectively. In the 
first one, we view accumulation at interest and survivorship. In the second, we view the 
accumulation at interest only, and take the death benefit as the net amount at risk, b(t) — V. 
This result is often referred to as Thiele ’s differential equation named after the Danish actuary 
T. N. Thiele, who discovered it in the late 19th century. 

In certain simple cases, for example, constant 4 and 6, this equation can be solved 
analytically. See, for example, Exercise 8.21(b). In general however, the only feasible approach 
is a numerical one. It is of interest to see what the solution will look like when we employ 
Euler's method, one of the main procedures for numerical solutions of differential equations. 
We describe this first for a general first order equation of the form 


f' (t) = (tf) 


given an initial value f (tọ). 
We pick a step size h > 0 and then simply replace the derivative with the difference 
quotient f(t + h) — f(r)/h leading to a recursion 


f(t+h) =f(t) +hg(t fO) (8.26) 


We then work backwards and forwards from fg, starting with given value of f (tọ). For example, 
from (8.26) we get an approximate value of f (tọ + h) and we then again apply (8.26) with that 
value to get an approximate value of f(tọ + 2h), and so on. Iterating the procedure we can 
compute approximate values of f(t), where t differs from tọ by an integral multiple of h. For 
sufficiently small h, this should give a reasonable approximations to the true values. Applying 
this to Thiele's equation we get 


ua V = VO 4 hô) + ha, — hu, Ab, — ,V). (8.27) 


Note that what we get is very close to Equation (6.8) applied to time units of length h, 
where for the period running from time f to time t + h, we take the interest rate for this period 
as hé,, the mortality rate for this period as A, (t), and the premium paid for this period as hz,. 
The only difference is that (8.27) does not apply the interest factor to the premium, assuming 
in effect that it is paid at the end of the period. For very short periods this will make little 
difference. 


8.10 Some examples of exact calculation 


As we indicated above, the normal approach to calculating values in the continuous case is to 
invoke UDD and reduce to the yearly case. Conceivably, one could have some nice analytic 
expressions for mortality functions and therefore be able to calculate values exactly. This is 
rarely done in practice. In the first place, it seems to be extremely difficult to find functions 
in closed form that provide an accurate picture of observed mortality. Second, even with such 
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functions, it may be difficult to actually carry out the integration. For illustrative purposes, 
however, we will look at a few particularly simple cases. These are not meant to be typical 
of modern human mortality. In Chapter 14, we will introduce some more realistic mortality 
functions. 


8.10.1 Constant force of mortality 


Suppose that u(x) is a constant y for all x. This of course is unrealistic, since we would expect, 
as verified by observed data, that in general u(x) increases with x. This is precisely what 
we mean by the aging process. There are exceptions. In the very early years there appears 
to be a decrease in p(x), reflecting the effect of early childhood diseases. There is also a 
noted decrease in the early adult ages, around 25 or so. Many attribute this to the effect of 
a large number of automobile accident deaths for people in their early twenties. From age 
30 onwards there is a rapid, exponential growth in u(x), until around age 70, where p(x) 
continues to increase but the exponential growth dampens out. There are some who predict 
that with medical and genetic advances, we may be approaching a situation where we can 
freely transplant or replace any genes or body parts that go awry. They postulate in essence 
that we will then have eliminated aging and achieved a constant force of mortality. This does 
not mean that people will no longer die, but only that the 90-year-old will be no more likely 
to die at any time than the 20-year-old. At the present time, however, we are not at all close 
to this state and it is still in the realm of science fiction. 

In any event, we look at some of the mathematical consequences of this assumption. In 
the first place, note that œ is now oo, since ,p, = e "' > 0, which means that there is at any 
time, a positive probability of continuing to survive. 

In our examples we will also assume a constant force of interest 6. Our first observation 
is that with these assumptions, 


oo co 1 
a, = ; e eM dt = f e Ht gt = —. 
0 0 H+6 


This is easily verified intuitively. As we have seen, the force of discount for the interest and 
survivorship discount function under our assumptions is the constant u + 6. Therefore, an 
investment of 1/(4 + 6) will provide a continuous return at the annual rate of [1/(u + 6)] - 
[u 0] 7 1. 

Taking 6 = 0, we see as well that 


which again makes sense, since as 4 increases, life expectancy should decrease. 
Turning to insurance, the constant forces of interest and mortality lead to 


A= | ee udt = —. 8.28 
x ; H "m (8.28) 


Note, as a check, that when 6 = 0 we get A = |. This is obvious, since the insured is sure to 
die sometime and receive 1 unit, which has a present value of 1 under a zero interest rate. 
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This formula has an interesting alternate derivation which avoids any integration. Suppose 
you have wealth A invested at a constant force of interest 6 and each instant of time, you use 
the interest earnings of 6A to buy an instant of term insurance, which will cost you y per unit. 
So the amount of insurance purchased at each unit of time will be (6/)A. At death you will 
have the principle of A plus the insurance benefit, so to make the total wealth at death equal 
to 1, we want (6/)A = 1 — A. Solving for A gives precisely the above quantity. 

For a 1-unit whole-life policy, with level premiums payable continuously for life, the 
annual rate of premium payment is given by 


x 


P= 


and the reserve on such a policy at any time f is given by 


Axy — Paga, = 0, 


as we see from substituting from above. In fact, this is just the continuous analogue of Example 
6.2. Since the cost of providing the insurance does not increase with time, the amount collected 
at any instant is exactly what is needed at that instant, and there is no accumulation of a reserve. 


8.10.2. Demoivre's law 
Another extremely simple mortality assumption states that for all ages x, 


t 
pude O<t<@-x. 


We have already encountered this law in Example 8.1. It is known as Demoivre’s law, 
named after Abraham Demoivre, an eighteenth-century mathematician, perhaps best known 
for his trigonometric identities. He is reputed to have used the above formula in making some 
life table calculations. We can look upon his reasoning as follows. We know that for fixed x, 
the function ,p, decreases from a value of 1 at t= 0 to a value of 0 at t = œ. Equipped with 
no other information at all, Demoivre made the simplest possible assumption, namely that the 
graph of the function was a straight line. It is obviously far too simplistic to be representative 
of actual mortality, but we will nonetheless investigate its mathematical consequences. 

We get a very simple expression for life expectancy: 


s d t 1 
= = )ar= 5 gy. 
E i (1- —)a- 1o -» 


That is, at any age, one can expect to live one-half of the maximum remaining lifetime. 
Similarly, taking an upper limit of n in the above integral, we calculate the n-year temporary 

life expectancy as n — n?/2(o — x) 2 n ( " /2Px)- That is, we multiply the maximum possible 

future lifetime over the next n years by the probability of living to the middle of this period. 
The force of mortality takes a particularly simple form. 


d 
p 

u(t) = —3 Poil O0<t<w-x 
tPx @a-x-t 
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which verifies Example 8.1, where we did the inverse calculation. This at least has the desirable 
feature of being increasing. 
This leads to 


1 
Ph = —— =q, O<t<@-x. 
0—x 
We noticed above that this formula held under UDD for an integer x and t in the interval (0,1). 
Under Demoivre's law, it holds for all x and t. As a consequence, 


= 1 @-x 1 
A, = —— v(t)dt = 
o — X Jo wWO-X 


G(1,,.3¥). (8.29) 


Once again, we can check that under zero interest, we get A, = 1. Similarly, 


A,(,) = ! 


a(1,; v). 


0—x 
A variation of Demoivre's law, which we will call the modified Demoivre's law, is given 


by 


a 
Hu (f) = ———, O0<t<o-x-t, 
o-x-t 


for some a > 0. Our original form was with a = 1. Using (8.16), this means that 


t 
Q—x 


a 
p.c (1- ) , Oxt«o-x 


which leads to a simple life expectancy formula, 


8.10.3 An example of the splitting identity 


The following is an example showing the use of the splitting identity in the continuous case. 
Example 8.5 Suppose that the force of mortality is given by 


0.02, 30xxx50, 
u(x) = 


1 


The force of interest 5 is a constant 5 %. Calculate Aso. 


Solution. By the splitting identity, 


A30 = Aao(159) + v0);opsoAso. 
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Proceeding as in the previous examples of this section, 


A39(159) = Es ES eg Dry Aso = i icem : v(20)s9P39 = e720(u+5) 
g uté : 50 ó 3 


Substituting for u and 6, we get A30 = 0.3058. 


8.11 Further approximations from the life table 


Suppose you want to approximate the force of mortality given only a life table. For ages that 
are not integers, the UDD assumption gives an easy answer. For an integer x and 0 < t « 1, 
we obtain from (8.20) that 


dx 


u(x t t) z : 
17,4, 


Suppose now we want to approximate u(x) where x is an integer. We could take t = O in the 
above, to get an estimated value of u(x) as just g,, but we could also replace x by x — 1 and 
take t = 1 to get an estimated value of u(x) as q, 4 /(1 — q,. 4). These are unlikely to be the 
same. It is easy to see where the problem lies. From UDD, the function p, is piecewise linear 
as a function of t and the derivative will not exist at integer values of t. We are in effect getting 
the different right and left hand forces of mortality at the integer points. To get a unique 
answer we need to consider an interval containing x as a interior point, such as (x — 1, x + 1). 
One such approach is to note that 


Px-1Px = 2Px-1 = e h HG, 

We now approximate the integral by one of the simplest methods for doing so, namely the 
mid-point rule. This simply says that to approximate an integral of a nonnegative function 
over an interval you take the value of the function at the midpoint of the interval and multiply 
by the length of the interval. (This approximation is exact when the function is linear.) With 
this method, the right hand side above is approximated by e~7#™, and solving gives the 
formula 


| log py-1 + log py 


2 (8.30) 


u(x) = 
(See Exercise 8.24 for a check on the accuracy of this approximation.) 
This can be furthered simplified by using the fact that for small values of r, — log(1 — r) 
is very close to r, which leads to the approximation 


Qx—1 $ dx 


2 , (8.31) 


u(x) = 

a average of two values of q at surrounding ages, which has a certain intuitive appeal. 
We now return to developing the three-term Woolhouse formula introduced in 7.5, which 
offers an alternative method to the a(m) — f(m) method for fractional annuity computations. 
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For motivation we revisit Formula (7.13) which arose from the assumption that for all integers 
x, and 0 < s < 1, the quantity y,(s) is of the form 1 — as where a = 1 — vp,. This assumption, 
together with constant interest, implies that 


sPx = él — ds), 
so that 


d 
qos = e" [6 = as) — a], 

and at s = 0 this derivative equals ó — a. This will be positive when 6 > a = 1 — vp, which is 
equivalent after some manipulation to 


p, > e(0 — ô). 


This explains the inconsistency observed in Example 7.5. If the above condition holds, 
and we assume the piecewise linearity of y,(s) as given above, the positivity of the derivative 
at s = 0 means that „p, actually takes values greater than 1 for sufficiently small values of s! 
Writing this in terms of Z, we can interpret it fancifully as meaning that people who have 
died are coming back to life in even greater numbers than were there before, causing the life 
annuity to be worth more than the one with fixed payments. 

Suppose we instead make the next simplest assumption. Assume that y, is a quadratic on 
[0, 1] so that for some constants a, b, 


y ()21-as-bs, O<s<1. (8.32) 


This is the only assumption we make now. We do not need constant interest. 
Choose any integer m and invoke well-known formulas for the sum of consecutive integers 
(as we did in Chapter 7) as well as the formula for the sum of their squares. These result in 


m-1 , m-1 


i m-1 i 
lS E 


i= i=0 


We now follow the method introduced in Section 7.1 to compute the relevant vector u. It 
follows from (8.32) that 


m-1 

1 m-— 1 m?-1 
ug = — Jm) = 1 (a+ b (55— ) +2 - 
0 m dy O/M atO (==) 


Next, calculate the coefficients a + b and 2b. From (8.32) 


atb=1-y,(1) =(V Ly). (8.33) 


using the notation of (2.14) with respect to the discount function y,. 
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We now consider the interest and survivorship force of discount A,(s) = =y" (s) /y,(s), and 
let 4, denote the vector whose entry at index k is A,(k). From (8.32) 


yi(s) = —a — 2bs, 


so that assuming differentiability at integer points, 


2b = y! (0) — y,(1) = -A,(0) + y, (0,0) = -V 40: (8.34) 
From (8.33) and (8.34) 
m-1 m?-1 
ty = 1- (5— ) Vio» - (=) (VÀ. 


Similarly we can verify that 


m-— 1 m—1 


ae (==) (Vie (==) (VAr 


Then by invoking (7.2) and (2.15) we arrive at the three-term Woolhouse formula. 


x i m-l. m —1,, 
ä™(e) züc)- zm a,(Ac) — 2n? ä (A, * Ac). (8.35) 
The particular form in the case of level payments is 
m —1 


ae) zü(1,)- s= (1 — v(n),px) — (A440) — A(Dv(),p,). 


12m? 


A direct calculation from (8.35) is not practical, as normally we will be computing from 
the life table and will not have available the exact values of u(x) needed to compute 4,. 
However we can simply use the approximations given by (8.30) or (8.31) in order to derive 
numerical results. 

Formula (8.35) seems to avoid for the most part the inconsistency of a life annuity having 
a higher present value than the corresponding fixed interest period annuity, but this can still 
arise in extreme cases. See Exercise 8.27. 


8.12 Standard actuarial notation and terminology 


We first review some terminology. Policies where benefits are paid at the moment of death, 
and premiums are paid continuously are sometimes referred to as fully continuous. Policies 
where benefits are paid at the end of the year of death, and premiums are paid yearly, are 
sometimes refereed to as fully discrete. Neither of these are realistic. The normal life insurance 
contract involves benefits at the moment of death, and premium payable yearly (or possibly 
monthly, quarterly, etc). These are referred to as semi-continuous. 
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The single premium notation for standard insurance and annuity contracts is the same 
as in the yearly case, except, as we have already introduced, A is replaced by A and à is 
replaced by a. For annual premium contracts, the rules given in 5.8.2 apply. When premiums 
are payable continuously at a level rate, the same notation is used with P replacing P. So, for 
example, 


° JP. n) denotes the net annual level premium, payable for h years, for a, 1-unit n-year 
endowment insurance on (x) with death benefits payable at the moment of death. (Note 
that we do not omit the single premium symbol, as it starts with A rather than A.) 


Au ; ; : . 

e P(A,.,) denotes the annual rate of premium payment, when premiums are paid contin- 
uously at a level rate for n years, for a 1-unit n-year term insurance on (x) with benefits 
payable at the moment of death. 


Symbols for reserves follow the rules in Section 6.9, which say essentially that you simply 
take the premium symbol and replace P by V, or P by V, and move the duration symbol to the 
top rather than bottom left. So, for example, the tth year reserve for the two above policies 
would be denoted by ^ V(A.. n) and VAL n)» respectively. 

The assumption of Exercise 7.13 is sometimes referred to as the constant force assumption 
since it implies that the force of mortality is constant over each year, as shown by Exercise 
8.21 below. It should not be confused with the assumption of a constant force over the entire 
span of life, as we discussed in Section 8.10. 


Notes and references 


The derivation of (8.18) uses a Riemann-Stieljes integral (see Rudin 1976, Theorem 6.17). 

For an alternate derivation of the general Woolhouse formula (which follows the original 
derivation of W. Woolhouse in 1859) see Dickson et al. (2013), Appendix B.2. In Section 5.13 
of that work, the author's take a known function for u, and show that the three-term formula 
gives remarkably close approximations to the true value. See also Exercise 8.26 for more 
confirmation of this. 


Exercises 


Type A exercises 


8.1 Alife annuity on (x) provides for benefits made continuously for 2 years, provided that 
(x) is alive. The annual rate of payment at time t is c(t), defined by 


t Oc«t«1, 
c(t) — 
1 l<t<2. 


The interest rate is 0, q, = 0.3 and q,,, = 0.4. Find the present value, assuming UDD. 


8.2 Given q, = 0.2, q,,, = 0.25, i = 0.25 and a vector e = (100, 300), calculate a,(c), 
assuming UDD. 


8.3 


8.4 


8.5 


8.6 


8.7 


8.8 


8.9 
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You are given that q49 = 0.1, q41 = 0.2, the rate of interest is a constant 25% and that 
UDD holds. Find the present value of a 2-year endowment insurance on (40), paying 
1000 at the moment of death if this occurs before age 42, and 1000 at age 42 if (40) is 
alive at that time. 


Suppose that qsọ = 0.2. Assume UDD. 
(a) Calculate y59(0.2). 


(b) Given that the interest rate is 25%, find the single premium for a l-year term 
insurance policy on (50) that pays 1 unit at the moment of death should this occur 
within 1 year. 


Suppose that the force of mortality is given by 
H, = B(1.09)* 


and 


(a) Find a value for B such the resulting mortality table is the illustrative table of 
Chapter 3. 


(b) Using this value of B compare the true value of us, with the estimated value using 
formula (8.30). 


A life annuity on (80) provides for benefits made continuously for 2 years, at the annual 
rate of c(t) at time t, provided that (x) is alive. Suppose that c(t) = t£, 0 < t < 2, and the 
interest rate is O. Find the present value under each of the following assumptions. 


(a) qgg = 0.09, gg, = 0.12, and UDD holds. 
(b) The force of mortality is given by 


0-f 
For a certain mortality basis, q5ọ = 0.2. Suppose that the force of mortality uso(t) is 
changed to a new force of mortality ji given by 
Find the new value of q5o. 


The force of mortality is a constant 0.04 and the force of interest is a constant 0.06. A 
term insurance policy has a level death benefit of 1 payable at the moment of death if 
this occurs within 40 years. Level annual premiums are payable continuously for 20 
years at the annual rate of of x. Find (a) z, (b) 49V. 


Suppose that the force of mortality for ages over 60 is given by 


D= t O<t<l, 


Find the probability that (60) will die within 2 years. 
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8.10 Suppose mortality follows Demoivre’s law with œ = 110 and that the interest rate is 


0. A term insurance policy on (60) has a level death benefit of 1000 payable at the 
moment of death provided this occurs within 40 years. Net level premiums are payable 
continuously for 20 years at the annual rate of z. (a) Find z. (b) Find 49V by both the 
prospective and retrospective methods. 


Two actuaries, A and B, agree that the probability that a female age 60 will die within 
10 years is 0.36. 


(a) Actuary A decides that the force of mortality for male lives is 1.5 times the force of 
mortality for females lives, at all ages over 60. What is the probability that a male 
age 60 will die within 10 years, under A's assumption? 


(b) Actuary B disagrees and decides that the force of mortality for males lives is 
obtained by adding a constant of 0.01 to the force of mortality for female lives, at 
all ages over 60. What is the probability that a male age 60 will die within 10 years 
under B's assumption? 


An insurance policy provides for death benefits payable at the moment of death. The 
amount payable for death at time t is e?-°8’, for all ż > 0. The force of mortality is 
a constant 0.06 and the force of interest is a constant 0.04. Net level premiums are 
payable at a constant rate for life. Find the rate of premium payment. 


Type B exercises 


8.13 
8.14 


Show that under UDD, for an integer x and 0 < t < 1, u(t) = q,/(1 — tq). 
The central rate of mortality at age x is defined by 

d, 
f l> dt 


(a) Show that m, can be expressed as a continuous weighted average of values of the 
force of mortality. 


(b) Show that under UDD, m, = p, (4). 


A whole-life insurance, with net level premiums for life payable continuously at the 
annual rate of x, provides for death benefits of e/* at the moment of death, for death 
at time f. Given that the force of mortality is a constant u and the force of interest is a 
constant ô, such that u + ô > y, find z in terms of y, y and ô. 


A 1-year deferred, 1-year life annuity on (x) provides for continuous payments from 
time 1 to time 2 provided (x) is alive. The rate of payment at time 1 + rise’ ’, 0 « t « 1. 
This is purchased by a single premium. For death during the first year, the single 
premium is returned without interest at the moment of death. You are given that 
q, — 0.1, 9,4; = 0.2, and that the force of interest 6 is a constant 0.10. Assuming 
UDD, find the single premium. 


8.20 


8.21 
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An n-year term insurance policy on (30) has a constant death benefit of 1, payable at 
the moment of death. Premiums are payable continuously at a constant rate for n years. 
Mortality follows Demoivre's law with œ = 100, and the interest rate is i = 0. Find the 
rate of premium payment as a function of n. 


The force of interest is a constant ô. The force of mortality is a constant p, for the first 
n years, and a constant py after n years. Derive expressions for a, and A, in terms of 
Hy, My, 6 and n. 


Suppose Demoivre’s law holds and the force of interest is a constant 6. Find an 
expression for a,(1,) in two ways. First use formula (8.9) directly, and second apply 
(8.23) to the expression given for A,(,) under the same assumptions. Verify that as 6 
approaches 0, you get the formula for the n-year temporary life expectancy. 


A ‘sawtooth’ life insurance policy on (x) provides for a death benefit of t paid at death 
if this occurs at time k + t, where k is an integer and 0 < t < 1. Assume UDD and 
constant interest. Show that the net single premium for this policy is equal to f(co)A,. 


Suppose that the force of interest is a constant ó and the force of mortality is a constant 
u. Consider an n-year endowment insurance with net premiums payable continuously 
at the constant annual rate of z. Death benefits of b, are payable at the moment of 
death for death at time ¢ < n. There is a payment of | at time n if the insured is then 
alive. 


(a) Suppose that b, = 1 for all t < n. Use the reserve differential equation to show that 


eur) — ] u T ó 

V = (a — wp) =—— = —————. 

V m (a-p) ipu 204 Hed 

Give an intuitive explanation of these results. Verify that these agree with the 
formulas given at the end of Section 8.8. Show that the reserve formula is a 
continuous version of formula (6.15). 


(b) Suppose now that b, = 1+ ,V for all t < n. Use Thiele's differential equation to 
derive formulas for ,V and z. 


Show that under the assumption of Exercise 7.13, the force of mortality is constant 
over each year. That is, y(t) = u,(s) where x is a nonnegative integer and s and f are 
between 0 and 1. 


Give an alternate derivation of (8.12) by using the continuous endowment identity, as 


given at the end of Section 8.8, and (8.23). First prove it for c = e. 


Suppose that modified Demoivre's law holds with a = 2. If u, = 30 what is e. ? 


Suppose that the force of mortality is given by 
u(x) = B(1.09)*. 


(a) Find a value for B such the resulting mortality table is the illustrative table of 
Chapter 3. 
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(b) Using this value of B compare the true value of 4(50) with the estimated value 
using formula (8.30). 


Suppose that the force of mortality is a constant 0.2 and the force of interest is a 
constant 0.1. Compute ae 1) by four methods. (a) formula (7.9), (b) formula (7.13), 
(c) formula (8.35) and (d) exactly. Compare the results. 


Find an example of some life table values and an interest rate, such that using (8.35) 
together with ( 8.30), yields an inconsistent result. Hint: Choose life table values so 
that the estimated value yz, is much higher than the estimated values of y,. 


A life is subject to a constant force of mortality of 0.15. The force of interest is a 
constant 0.10. An insurance contract on this life provides for a death benefit of eO? for 
death at time f, with level premiums payable continuously for life. 


(a) Find ,V. 


(b) Write down Thiele's differential equation for the reserve, and verify that your 
answer to (a) satisfies this equation. 


Spreadsheet exercise 


8.29 


Modify previous spreadsheets to handle the case of death benefits payable at the moment 
of death. 


Select mortality 


9.1 Introduction 


In this chapter we will discuss a refinement to our basic model. In previous chapters we 
assumed that for a sufficiently homogeneous group — North American male nonsmokers, 
for example — mortality rates are a function of attained age only. This may be a reasonably 
accurate assumption for the totality of people in this group, but it need not apply for certain 
subsets. An insurer, after all, is not interested in the mortality of the general population, 
but only the subset of those who buy policies. Observed data show that for purchasers of 
life insurance, mortality depends not only on age, but on duration since the policy was 
purchased. To see why this should be true, let us compare two people both age 60, alike 
in all respects except that A just recently purchased an insurance policy, and B purchased 
a policy | year ago at age 59. During the next year, we would expect that A will have a 
higher probability of living to age 61 than B will. This arises from the fact that the insurer 
does not have to accept everyone who applies for a policy. Applicants can be requested 
to undergo medical exams, or to furnish information regarding their health, lifestyle and 
other factors, in order to verify that they are reasonable risks. Individual A has just been 
so certified. Individual B, on the other hand, was certified as a good risk 1 year ago, but 
this condition could have changed. B may have contracted a fatal disease, or taken up life- 
threatening habits. Moreover, if we consider a third individual, C, also age 60, similar to A 
and B, except that he/she was sold a policy at age 58, his/her chance of living over the next 
year can be expected to be even less that that of B. This person has had 2 years in which 
to deteriorate. 

The standard actuarial method to handle this situation is to use the same notational device 
that we introduced in Section 4.3.2. We keep the convention that subscripts denote attained 
age, but put square brackets around the age that the policy was issued, in order to separate 
age and duration. Therefore, the symbol ,q;,;,, denotes the probability that a person now 
age x + t, who was sold insurance at age x, will die within the next s years. This is known 
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as a select mortality rate, as it incorporates the effects of the insurer’s ability to select. We 
define 


sP[x]4+t = t= sdx]et- 


As usual we omit left subscripts of 1 for q and p. Our discussion above showed that 


Jeo) € d[soj41 S d[58)42 -+> 


and, in general, 


dikt S dix-speutes (9.1) 


for all nonnegative x and t and 0 < s < x. 


9.) Select and ultimate tables 


To fully model this phenomenon we would need a separate mortality table for each age x 
to cover policies issued at that age. Of course the tables would get shorter with increasing 
issue age. The age x table would consist of the œ — x entries (qu, t = 0, 1,...,@ —x — 1}. 
Observations show, however, that the selection effects decrease with time and can be assumed 
to wear off after a certain duration, known as the select period. If the select period is r, we 
can expect that two individuals of the same age, who have been issued insurance r or more 
years ago, will exhibit the same mortality rates, even if the issue age is different. This allows 
for some simplification. We can revert to the symbol q, to mean the probability that a person 
now age y who was issued insurance r or more years ago will die within a year. In other 
words, we postulate that equality will hold in (9.1) when t > r, so we can remove the square 
bracket and denote the common value by q,,,. The rates q, are known as ultimate rates, and 
the resulting life table is known as a select and ultimate table. A common choice for select 
period is 15 years, although some recent tables have used as much as 25. The simple example 
below has a select period of 2, but that is enough to illustrate the basic idea. 


x dix] dix]+1 dx+2 x+2 
60 0.10 0.12 0.15 62 
61 0.11 0.14 0.17 63 


62 0.12 0.15 0.18 64 


To find the mortality rates applicable to a policy issued at age x, we start at the left column, 
read across until we get to the ultimate column, and then read down. So, for example, the rates 
used for a policy issued at age 60 would be in order g¢q}, 4t60]+1> d[60]42 = %62> d63: 164 --- 
which in this example are 0.10, 0.12, 0.15, 0.17, 0.18,.... We can picture the flow of mortality 
rates as several streams, all running into the same river, which is the ultimate column. 
Therefore, we do not need a complete separate table for each age, but only r + 1 columns. 
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Tables with a select period of 0 years, which is precisely what we have been discussing in 
previous chapters, are known as aggregate tables. 

Some people prefer to construct a select and ultimate table of Z,, although this is not 
really necessary, for as we have seen, one can always work directly with the values of q,. 
The process is straightforward, although it requires a little care. One completes the ultimate 
column first, proceeding as in Chapter 3 to calculate values of Z, starting with an arbitrary 
value for Z, which is the first entry in this column. One then simply works backwards to 
complete the other entries, using the recursion 


? E Übel 
k= YT. 
a 1- q[x]--k 


For example, in the above table, if 63 = 5000, then Z[5;,,, = 5000/0.86 = 5814 and Zi; = 
5814/0.89 = 6353. 


9.3 Changes in formulas 


To change our previous formulas to incorporate select mortality is usually just a matter of 
inserting a square bracket around the issue age. We will look at some of these in detail. 


Chapter 3 


The multiplication rule (3.6) becomes 
s+tP [x] = Pix] tPposes: (9.2) 


Chapter 4 


We will denote the interest survivorship discount function for a policy issued at age x by yj. 
From (9.2) we derive 


Yelk +n) 


(kk -n) 
i ypa() 


= v(k, k+ n)nPix+k: (9.3) 


Taking n = 1, the recursion formula (4.5) now becomes 
Yelk + D = yig vk, k + Picek 
It is important to note that with select mortality, (4.6) and (4.7) no longer hold. As 


mentioned, we need a separate table for each issue age, and there is no longer any necessary 
relation between the ages. Note, for example, that 


Yir) = v(n)nPix+k]> 


and even if v(k, k + n) = v(n) this is not the same as the right hand side of (9.3) unless k is 
greater than or equal to the select period. 
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The subscript of [x] + k on annuity symbols is used in the same way as introduced in 
Section 4.3.2, except that now, modification to the mortality as well as the interest is needed. 
That is 

@-x—-k-1 
üO mày o) m M colk, k+ Dip 
j-0 


Note that àj.4, 4 (€) = 4 (c), if k is greater than the select period, and interest is constant. 
With non-constant interest however, these are not necessarily equal even if k is greater than 
the select period, as we observed in Chapter 4. 


Chapter 5 


In the third formula in (5.1), qp} becomes qp. 
The subscript [x] -- k is used for insurances in the same way to that noted above for 
annuities. That is 


o—x—k-1 


Apja(b) = È, bis k Ej DjiPtau dirki 
j-0 


Chapter 6 


No changes are necessary in this chapter except to note that the subscripts of [x] + k in reserve 
formulas are modified as we noted for Chapters 4 and 5 above. 


Chapter 8 
We define 


Über ~ Ü ete 
h>0 hus, 


(We are following standard notation here. The square bracket is not necessary since the x is 
already separated from the 7.) The basic identity (8.16) can then be written 


t 
- fo uid 
Pix] =e Jo Hx(r) df 
Of course, if the select period is 0 then y,(f) is just equal to u(x + f) and agrees with the 
notation that we have already introduced in Chapter 8. 


Spreadsheets 


The interested reader may want to adapt the previous spreadsheets that have been introduced 
to incorporate a select and ultimate table. However, without doing this fully, one can still 
do calculations directly on the existing spreadsheet for each particular problem. For a policy 
issued at age (x) on a table with a select period of r years, one simply replaces the entries of 
Qy44 With qig for t = 0,1,...,r— 1. 
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9.4 Projections in annuity tables 


There are other situations where a two-dimensional approach to mortality rates is called 
for. One such example is annuity tables. We need however a different formulation than 
that presented above. The select tables we have described would not normally be used for 
annuities. Purchasers of annuities are generally people who expect to live longer than the 
average or they would be unlikely to enter into such a contract. In any event the insurer 
is not worried about deteriorating mortality with duration. The problem here is exactly the 
opposite, namely, people may live longer than the tables predict, requiring that more must be 
paid out in benefits than the premiums allow for. This has proved to the case in recent years, 
as advances in medical and public health knowledge, better lifestyles and other factors have 
led to continually improving mortality, causing issuers of annuities as well as pension plans, 
to be confronted with what has been termed longevity risk. In addition to age, year of birth 
becomes an important variable when pricing annuities. A 60 year old born say in 1950 could 
be expected to have less chance of dying in the next year, than a 60 year old born in 1930, due 
to improvements in mortality that have come about over the 20 year period. In place of the 
rates g,, we would like a two variable function q,,, where x denotes age and n denotes year 
of birth. 

We will alter this notation somewhat in order to point out the parallel nature of this 
situation with our original discussion of select mortality. Suppose we have picked a base year, 
say the year a mortality study was completed. We will now let q,,,,, denote the probability 
of dying within 1 year, for a person now age x + t who was age x in the base year. In other 
words we have just extended the select notation by allowing ft to denote time elapsed since 
some particular event, which could be anything at all. In our original application, the event 
was observation for insurance purposes. For annuity tables, it is the year of the mortality study 
that produced the table. In this case, we have in place of (9.1) 


dier Z dp-s]eress 


since, letting b denote the base year, the term on the left refers to someone born in year (b — x) 
while that on the right refers to some one born in a later year (b — x + s). 

There is a however a major problem in constructing a table to show such rates. Suppose 
for example that a mortality study of annuitants was completed in the year 2000. The result 
would be a table showing qp, the one year mortality rates for people age x in the year 
2000. Now suppose we want to know what qy60)41 is. This would be the rate for a person 
age 61 who was born in 1940, but in the year of the study no such person would ever have 
existed. The rate qjgoj,, (as well as all rates with f > 0) could not be based purely by a 
statistical analysis of past mortality data, but would involve the prediction of future trends 
in mortality improvement. This is rather difficult task, and present actuarial usage involves 
rather simple tools. The usual method is to assign a yearly projection factor r, for each age, 
which represents the reduction in mortality expected at that age for each year. That is, it is 
assumed that, 


Qpx]--t = dp) z Pn 


Following is an example taken from an actual table known as the Uninsured Pensioner 
(UP) 1994 Table with projection scale AA. 
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Example 9.1 The UP 1994 table shows male mortality rates for age 60-62 of 0.008576, 
0.009633, 0.010911, respectively and projection factors of 0.016, 0.015, 0.015, respectively. 
A male is age 60 in the year 2010. Using this table find the probability that he will live to age 
63. Compare this with the probability that a male who was age 60 in 1994 would live to age 63. 


Solution. The required probability is 
[(1 — (0.008576)(0.984)!6][(1 — 0.009633)(0.985) 7 ][(1 — 0.01091 1)(0.985)!8] = 0.9778. 
For the male age 60 in 1994, the probability would be 
[(1 — (0.008576)][(1 — 0.009633)(0.985)][(1 — 0.010911)(0.985)?] — 0.9716. 


There is no reason to believe that mortality improvement will stop, so the concept of 
a select and ultimate table does not appear in this context. Therefore it is not usual in the 
actuarial literature to view the table that we have described as a select table. Rather it is 
viewed as a collection of separate tables, one for each age x in the base year. Each of these 
separate tables is often referred to as a generational table, as it traces the mortality of people 
all born in the year b — x. In the construction above each generational table would appear as 
the row of the select table as described. Each column of this select table can be considered as 
a version of the original table projected to a future year, and is referred to as the static table 
for that year. It practice, insurers often simplify calculation by making use of the appropriate 
static table for setting annuity rates in a particular year, rather than using the more accurate 
generational tables, which would involve a separate table for each age. 


9.5 Further remarks 


There are many other situations in which the concept of select mortality applies. For another 
example involving rates decreasing with duration, suppose we are studying the effects of a 
certain treatment for cancer patients. This could depend both on age and the duration since 
treatment. In this case, if a person was treated a year ago, and is still alive, there is some 
indication that the treatment is working, and that such person is more likely to live than 
someone of the same age who has just received the treatment. 


Exercises 


9.1 A person now age 62 was sold insurance | year ago at the age of 61. Mortality is given 
by the table at the beginning of the chapter. Find the probability that this individual will 
die between the ages of 63 and 65. 


9.2 A 4-year term insurance policy sold to a person age 61 provides for 1000 paid at the end 
of the year of death should this occur before age 65. Level annual premiums are payable 
for 4 years. The interest rate is a constant 5% and mortality is given by the table at the 
beginning of the chapter. 


(a) Find the premium. 


(b) Find ;V. 


9.3 


9.4 


9.5 
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Explain in words what the following indicates: 


(GPr3014-10) (8d [3014-16)- 


For the table given in Section 9.2, construct the corresponding table of 7, ,, given that 
A 62 — 5000. 


An annuity mortality table, with projection factors for improvement, constructed in the 
year 2000, shows 


q70 = 0.02, qu =” 0.03 qn = 0.04, r10 = 0.02, Ep 0.018, rn = 0.016. 


Find the probability that a person age 70 in the year 2010 will die between the ages of 
72 and 73. 


Spreadsheet exercises 


9.6 


9.7 


Consider a select and ultimate version of the sample table, with a 15-year select period, 
as follows. For x > 50, and t = 0,1,...,14 we take 


s = |- (1.00001) !5~#e~0.00005(1.09)"*"_ 


Interest rates are a constant 6%. For a 20-year endowment insurance on (50) with a level 
death benefit of 100 000 and a pure endowment of 100 000 at age 70, purchased by level 
net premium for 20 years, calculate (a) the premium, (b) 49V. 


Using the same table as in Exercise 9.6, calculate (a) etsoj» (b) ei5oj,40. 


10 


Multiple-life contracts 


10.1 Introduction 


In previous chapters we have studied insurance and annuity contracts sold to a single individ- 
ual. There are many situations when such contracts are sold to a group of several people. We 
will concentrate on the case of two lives, the most common arrangement, which is sufficient 
to exhibit most of the complications. 

The following are a few of the more usual situations. A married couple may wish to 
purchase an annuity that pays income while either of them is alive. They may also want an 
insurance policy that pays benefits on the second death, which in some jurisdictions enables 
them to pass on assets without paying estate taxes. Two business partners may desire a policy 
that pays a death benefit upon the first death of the two. A person may arrange for an annuity to 
be paid to a dependent, where payments begin only on the death of the supporting individual. 
We will discuss these and more in the following sections. For simplicity we assume aggregate 
mortality in this chapter. It is not too difficult to modify the results for the effects of selection. 


10.2 The joint-life status 


Consider a pair of lives, (x) and (y). Suppose that we consider the pair to be in a state of 
survival when both of them are alive. In other words, the pair fails on the first death. Such 
a pair is commonly known as a joint-life status. An annuity sold on a joint-life status would 
provide income as long as the status survived, that is, as long as both individuals were living. 
Annuity benefits would stop upon the first death. An insurance policy sold on a joint-life 
status would pay a death benefit upon the first death. 

While it is possible to construct life tables in this case, it is awkward, and it is best to work 
directly with probabilities. The basic quantity we need is the probability that both of (x) and 
(y) will be alive in ¢ years. We will denote this probability by ,p,,. 
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We will now derive an expression for ,p,, that can be calculated from the life table for 
single lives. The method discussed will be well known to the reader who is familiar with the 
concept of independence in probability theory (see Section A.2). 

Suppose, for example, that ,p, = 0.9, ‚p, = 0.8. Imagine that we start by observing 100 
pairs of lives, where one is age x and the other is age y. Of these 100 pairs, we can expect 
that in 90 cases, the life age x will be alive in ¢ years. Out of these 90 cases we can expect 
that 80% of the time the life age y will survive f years. This leaves on average 72 pairs of the 
100 where both lives survive, so we should have ;p,, = 0.72. 

It is instructive to derive this in another way by looking at deaths rather than survivors. 
We expect 20 deaths altogether from the age y individuals. On average, two of these age y 
deaths should occur among the 10 cases where the age x individual dies. This leaves 18 age 
y deaths among the 90 cases of age x survivors, and therefore 90 — 18 — 72 cases where both 
survive. 

The particular numbers 0.9 and 0.8 do not matter and the same arguments show that in 
general we should have 


tPxy = Px iPy: (10.1) 


We define 


idxy = 1- tPxy> 


the probability that the joint-life status will fail within ft years, which is the probability that at 
least one of (x) and (y) will die within f years. 

The reader should carefully note that ‚q, is not equal to ,q, ,,. The latter expression is the 
probability that both lives will die within t years, which is certainly less than the probability 
that at least one will die in this time interval. To express ,q,, in terms of single-life rates, we 
have ,q,, = | 7p, Py = 1 — (1 7,4,)(1 7,4), which gives 


idxy = dx + idy T (dx (dy: (10.2) 


We now confess that although (10.1) and (10.2) are almost universally used, the argument 
we gave above is not quite accurate. It would be perfectly valid if the two lives were completely 
independent of each other. This is unlikely to be true, however, for two people who would 
choose to buy an insurance or annuity contract together, such as a married couple or two 
business partners. One can expect that their lives are intertwined to the extent that life- 
threatening occurrences for one would also affect the other. 

For example, suppose that the two lives are business partners who frequently fly together. 
If one of them were to die in a plane crash, it is likely that the other one would as well. To 
illustrate the point, we take an extreme case. Return to the example given above, but suppose 
that the 100 pairs we start with consist in every case of two people who always fly together. 
We claimed above that there would be on average 2 age y deaths from the 10 cases where 
the individual aged x died. But if we expect that the lives are likely to die together, we can 
certainly expect more than this. For example, if all the age x deaths were from plane crashes, 
we would necessarily have 10 age y deaths as well. This is extreme, but suppose that we 
have even 3 instead of 2 age y deaths. This leaves 17 instead of 18 deaths from the 90 age x 
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survivors and we would have ,p,, = 0.73. This shows that rather than (10.1) we would expect 
for the typical buyers of an insurance or annuity contract that 


tPxy > Px Py: (10.3) 


For married couples there is an additional factor that supports (10.3). Statistics bear out the 
fact that upon the death of one partner, the survivor, deprived of the care and companionship 
of the deceased, often tends to die earlier than otherwise expected. This has been referred to 
as the ‘broken heart syndrome’. 

For convenience, we will assume (10.1), or equivalently (10.2), unless otherwise stated. 
The reader should note, however, that most of the key formulas remain true without this 
assumption. The main use of (10.1) is to conveniently calculate numerical quantities, given 
the single-life data. 

Generalizations of the joint-life status can be found in Chapters 17 and 19. On the question 
of dependence, see in particular Sections 17.5 and 17.6. 


10.3 Joint-life annuities and insurances 


Consider a joint-life annuity contract paying c; each year provided that both (x) and (y) are 
alive. In order for a payment to be made at time k we need both lives to have survived to that 
time, and the probability of this is just ,p,,. The present value of the benefits, denoted by 
ü,, (C), is given by a formula exactly the same as (4.1), except ,p,, replaces ;,p,: 


N-1 


dyy(C) = >, CK) Preys (10.4) 


k=0 


where N = min{ø — x,œ — y}. 

We use N in this way throughout the chapter. To verify that it is the appropriate upper 
bound, note for example that if x = 40, y = 70 and œ = 110, the last possible payment would 
be at time 39. At time 40, (70) would not be living according to our model. As with single 
lives, we will use 1,, to denote a vector with entries of 1, running to duration N — 1, and by 
convention the omission of a benefit vector or function implies that it is 1%- 

In the continuous case we have the formula 


N 
,(C) af c(t)v(t),p,, dt, (10.5)t 
0 


for the present value of an annuity, with payments made continuously at the annual rate of 
c(t) at time t, provided both (x) and (y) are alive. 

We can also consider a joint-life insurance policy that pays b, at the end of the year of 
the first death of (x) and (y) when this occurs between time k and time k + 1. In other words, 
payment will be made at time k + 1 if both lives survive k years, but both do not survive k + 1 
years. The probability of this is just 


kPxy ~k+1 Pxy: 
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This probability can also be expressed as 


kPxy dx+k:y+k> 


since for such an event to happen, both lives must survive for k years and then one must 
die in the subsequent year. (A colon is used to separate the two ages in a joint-life status if 
needed for clarity.) This latter expression can also be derived from the joint-life version of the 
multiplication formula, 


k+1Pxy = kPxy Px+k:y+k> 


since for both (x) and (y) to survive k + 1 years, they must first both survive k years, and then, 
being age x + k and y + k respectively, they must both survive one more year. 

The present value of the joint-life insurance, denoted by A,, (b), can be written, analogously 
to the second and third expressions in (5.1), as 


N-1 
Ayb) = Y. bv + GPx — apa) 
k=0 
s» (10.6) 
= 2: byv(k + Dips, Qxekyke 
k=0 


Suppose that such a joint-life insurance were sold with annual premiums, based on the 
premium pattern vector p. Premiums will cease upon the first death, so the premium payments 
form a joint-life annuity. We determine the initial premium z in exactly the same way as 
with a single-life insurance, namely, 


Ag (b) 10.7) 
To = DOS (10.7) 


10.4 Last-survivor annuities and insurances 


10.4.1 Basic results 


Consider a pair of lives (x) and (y), which we consider to be in state of survival if either 
of them is alive. In other words, the pair fails upon the second death. This is known as a 
last-survivor status. The standard symbol for this is xy, which distinguishes it from a joint-life 
status. The probability that this status will be in a state of survival at time f is denoted by ps. 
To calculate this, we use (A.3), a basic fact from probability theory, already illustrated above 
in (10.2), which says that the probability that at least one of two events occurs is given by 
the sum of the probabilities of each occurring, less the probability that they both occur. This 
gives 


tPxy = tPx ce tPy — tPxy- (10.8) 


148 MULTIPLE-LIFE CONTRACTS 


We let ‚qzy denote the probability that xy will fail before time t. Clearly qz; = 1 —, Py, so 
that if our independence assumption (10.1) holds, we can substitute in (10.8) to obtain 


idxy = 19x tly (10.9) 


Alternatively, this can be derived directly by noting that in order for this status to fail within f 
years, we need to have both lives fail within f years. 

The probability that (x) fails between time ¢ and time s + f is given either by ps — 
or by s4143 —, qx. Assuming independence, the latter expression simplifies to 


S+t Px 


Gd Gd) F Gq,)G4,). 


Formulas for annuities and insurances based on the last survivor status are easily obtained. 
As in the joint-life case we simply replace the subscript (x) by (xy). So, for an annuity paying 
c, at time k, provided that both (x) and (y) are alive, the present value is just 


M-1 


üg(c) = M ve, Pa = (e)  ày(c) — à, (e), (10.10) 
k=0 


where the upper limit M now denotes the maximum of œ — x and o — y. The final expression 
results directly from (10.8). 

The analogous formula holds for continuous annuities with a replacing à. 

Similarly, the present value of an insurance contract paying b, at the end of the year of 
the second death of (x) and (y), if this occurs between time k and k + 1 is given by 


M-1 


As(b) = 3 vk + Db (ras i1 P3) = ALD) + Ay(b) — A,, (0), (10.11) 
k=0 


where we again use (10.8), as well as the second formula in (5.1) to derive the final expression. 
Formula (10.11) can be verified directly. Suppose, for example that (x) dies first, at a time 
between j and j + 1. Then the first term on the right hand side pays b; and the third term pays 
—bj, so a net amount of 0 is paid on the first death. When (y) dies iater at a time between k 
ad k + 1 only the second term pays and y will get the required death benefit of b}. The same 
type of reasoning can be applied to verify (10.10) as well as many other types of multiple-life 
contracts. 
These last survivor annuities and insurances are a special case of more general contracts 
that we will investigate in Sections 10.6 and 10.7. 


10.4.2 Reserves on second-death insurances 


Reserves on last survivor insurances present a particular problem. For a second-death insur- 
ance, the true reserve at time k depends on the state of the pair. There are three possible 
cases. (For simplicity in notation we assume constant interest. In the general case one uses 
the investment discount function v o k in computing A and à.) 


VO = 


HE ub ok) — a EIU ok) if both lives survive k years. 


V = A,,,(bok) —àü, Grok) if (x) only survives k years. 
,VO = Ay (bok) — Gy4,(7 0k) if (y) only survives k years. 
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The above formulas also assume that the premium pattern remains unchanged on the first 
death. If the contract called for reduced premiums in such an event, then suitable adjustments 
would be required in the second term of the formulas for , V and ,V. 

It may seem puzzling at first that the reserve on this contract should make a sudden change 
on the first death, but a moment's reflection will verify that this is perfectly logical. The pairs 
for which the first death has already occurred present a greater liability and a higher reserve 
is needed. Taking a retrospective, point of view, what happens is that there is a transfer each 
period from the funds of the pairs with both individuals still living, to the funds of those where 
one has already died. 

The problem is however that there is no reason for a pair of lives to report the first death 
to the insurer, unless so stipulated in the contract. This means that the insurer might well not 
be aware of the state of the various pairs. In this case an appropriate solution would be to hold 
a weighted average of the different reserves. That is, for each contract the reserve at time k 
would be 


kPxy pV + (Px Tk Pry) ,V + (Py a. Pry) VO 
kP?xy 


If the actual mortality experience is close to that of the weights used above, then this 
method should produce aggregate reserves which are close to what they should be by the 
more exact method. 

Note however that it would not be reasonable to base cash values on such a blended 
reserve figure, which will clearly be higher than ,V. This would create a definite anti- 
selection opportunity as those pairs for which both are still living, could cash in the policy 
and receive an amount which is more than that which they are entitled to. 


10.5 Moment of death insurances 


To handle moment of death insurances for two-life statuses, we need the joint-life analogue 
of the force of mortality, 


Definition 10.1 The force of failure for the joint-life status (xy) at time t is the quantity 


d/dt Pyy d 
Qos Pol ee ). 
Hyy( ) Pry dt O8(Pxy) 


The independence assumption (10.1) lets us calculate this from the single life functions. 


d 
I) = — dog p, + log py) = w(t) + Hy (0). (10.12) 


Consider an insurance policy that pays b(t) at time t if the first death of (x) and (y) occurs 
at that time. The present value will be denoted by A,y(b) and given, analogously to (8.18), as 


N 
A, (b) = i D(1)V(1) Pay Hxy dt. (10.13) 
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The present value of an insurance paying b(t) at time t if the second death of (x) and (y) 
occurs at that time is denoted by As(b). The same reasoning as used after formula (10.11) 
establishes the analogous formula 


Ax (b) = A,(b) + A,(b) — A,,(b). (10.14) 


As with the single-life case, we are faced with the problem of evaluating A,, from the life 
table. Assume that b takes the constant value of b, over the year running from time k to time 
k +1. It is natural to ask if the same i/6 correction that we used for single-life insurances 
is also applicable to the joint-life case. Recall that this correction came from (8.20), so the 
question is whether the corresponding statement is true for joint-life statuses. That is, is it 
true that iDxy ( "MORE uy) = dry for 0 < t < 1? Assume UDD and therefore (8.20) for single 
lives. Then, for 0 < t < 1, 


Pry (M(t) + Hy) = Py Ie + Px Dy 
-(1- tqy)qy +0 - 1x) 4y 
= Ixy + (1 pan 2t)q4xqy» 


where we use (10.2) for the last equality. We see that the required condition does not hold 
exactly, due to the extra term of (1 — 27)q,q,. Arguing as in Section 8.6.2, we deduce that 


; i 
ASQ) — A, (5 b) +R, (10.15) 
where b = (bọ, b4, ...) and 
N-1 
R= Y WOVE + Dips, Ges yoe (10.16) 
k=0 


with r(k) = yt V(k + Ll, k + £)(1 — 20dt. Analysis shows that r(k) > 0 so that R is positive, but 
likely to be quite small and safely ignored in practice. Using the i/ó correction will slightly 
understate the value of the insurance, but it is a reasonable approximation. R is the present 
value of a contract that pays off only if (x) and (y) die in the same year, an event that will have 
small probability for most ages. Moreover, since 1 — 2t is positive for 0 € t < ; and negative 


for ; « t € 1, the cancellation in the integral will tend to make r(k) small. 


10.6 The general two-life annuity contract 


There is a demand for two-life annuities that are more flexible than ones we described in 
previous sections. One popular arrangement is for benefits to continue while either person 
is alive, but for the amount to reduce when only one of the parties is alive. For example, a 
married couple may wish to provide for an annuity at retirement that will pay 60 000 yearly 
while both are alive, reducing to 40 000 yearly when only one is alive. This one-third reduction 
is a common one, reflecting the fact that while one person can live more cheaply than two, 
he/she will need more than half of the amount, since many expenses, such as housing, will 
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not necessarily reduce. Many variations are possible. Benefits need not be symmetric and 
could vary according to the particular survivor. A common provision in pension plans is that 
the income stays level as long as the employee is alive, but will continue to the spouse at a 
reduced level upon the death of the employee. We present a single formula that covers all 
possibilities. 

A general life annuity on the pair (x) and (y) can be described by three annuity benefit 
vectors: f, where f, is the amount paid at time k if only (x) is alive; g, where g, is the amount 
paid at time k if only (y) is alive; and h, where h, is the amount paid at time k if both (x) and 
(y) are alive. Let j = h — f — g. We can think of the contract as three separate annuities. One 
is a life annuity on (x) with benefit vector f. Another is a life annuity on (y) with benefit vector 
g. These two will provide for the required payments when only one of the pair is living. The 
third contract must adjust the payment when both are alive. If both are living at time k, the first 
two annuities will provide f; + g}, so the third must provide the difference, j, = hg — fk — &- 
The present value of the complete contract is then the sum of the three separate present values, 
which is just 


á,f) + à,(g) + à, GJ. (10.17) 


This formula reduces the calculation of present values for the general two-life annuity to 
calculating those for single or joint-life annuities. (See Section 10.12 for spreadsheet methods 
to calculate the latter.) 


Example 10.1 Verify that (10.17) reduces to (10.10), when the yearly benefit is a constant 
1 unit. 
Solution. We have f, = g, = hy = 1 for all k, so ją = —1 for all k and (10.17) gives 


ay 


y7d,t*à,—àd 


Xy' 


Example 10.2 An annuity on (x) and (y) provides yearly payments as long as either (x) or 
(y) are alive. Payments begin at 12, but reduce to 8 if (x) only is alive or to (6) if (y) only is 
alive. Find the present value 


Solution. f; — 8, y — 6, and h; = 12, so ją = —2 for all k. This gives a present value of 


Sá, + 6ä, — 2à,,. 


Example 10.3 — An annuity pays | unit each year, provided that at least one of (50) and (65) 
is living and is over age 70, but not if (50) is alive and under age 65. Find the present value. 


Solution. Letting x refer to (50) and y to (65), 


f- (020, 15). g= (05, 15). h- (015, 15). 
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so j = —(05, 119, 05, L). The present value is then 


üso(05o, 149) + 465(05, 155) — 450:65(95, 119,95, Læ). 


Example 10.4 Find a formula for the present value of an annuity that pays c, at time k, 
provided that (y) is alive, but (x) is not alive. 


Solution. f, = 0, gy = cy, hy = 0, so that j, = —c;, for all k. The present value is 
(c) — à, (c). 


A contract of the type described in Example 10.4 is known as a reversionary annuity. It 
provides a life annuity on one life, which does not begin until another life has died. It can 
be used for a similar purpose as life insurance, except that the proceeds are paid out as a life 
annuity to another person, rather than as a lump sum. As well, reversionary annuities are often 
used to put a value on certain inheritances that stipulate that the income from a certain asset 
will first go to a certain individual and then, upon the death of that party, revert to another 
person. 


Remark We comment briefly here on the effect of using assumption (10.1) in which ,p,, 
was approximated by a quantity that is normally too low, as we saw in (10.3). It follows that 
this approximation will give values for à,, (b) that are slightly too low (assuming the entries 
of b are nonnegative). Since the coefficient of this term is negative in the most common 
examples of two-life annuities, as illustrated above, the standard assumption of independence 
means that most two-life annuity premiums are somewhat higher than they would be if a more 
realistic model was used. 


10.7 The general two-life insurance contract 


The most common type of policy sold to two lives is the joint-life insurance described above, 
where payment is made on the first death. In some cases, however, people want policies that 
will pay on the second death. As we indicated in the introduction to this chapter, this can be 
used as a means of minimizing estate taxes. We will in fact consider a general type of contract 
where death benefits can be paid on both deaths. Assuming payment at the moment of death, 
such a policy will be described by two death benefit functions: b(t), the amount paid at the 
time of the first death; and d(t), the amount paid at the time of the second death. We can view 
this as two separate insurances, one on the status xy with benefit function b, and one on the 
status xy with benefit vector d. Substituting from (10.14), the present value of benefits is 


A,,(b) As; (d) = A,(d) + A,(d) + A,,(b — d). (10.18) 


This can be verified directly by the same type of reasoning as that used after formula (10.11). 
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10.8 Contingent insurances 


In some insurance policies written on two lives, the benefits depend on the order of death. In 
the most common cases, a designated individual must die first or second in order to receive 
benefits. If this does not occur, then no benefits are paid. For this reason these are known as 
contingent insurances. We will first consider such contracts with benefits payable at the end 
of the year of death. It is assumed throughout this section that there is zero probability that 
two lives will die at exactly the same instant of time. 


10.8.1 First-death contingent insurances 


We first need to compute some new probabilities. Let ql, denote the probability that in the 
next year (x) will die and that (y) will be alive at the time of the death of (x). The symbol of 1 
above x indicates that (x) is the first to die. Note that (y) does not have to die within the 1-year 
period in order for the event in question to occur. 

The first problem is to estimate this probability from the life table. We will derive the 
answer here intuitively. A more formal derivation appears in Chapter 17. 

We note that qi, can be approximated by another probability. Consider the following 
events: event A consists of (x) dying within the year, before the death of (y); event B consists 
of (x) dying within the year, and (y) surviving to the middle of the year. Then q}, is the 
probability of A. We claim that this is close to the probability of event B. These events are of 
course not the same. If (x) and (y) both die in the first half of the year, with (x) dying first, then 
A occurs but not B. On the other hand, if (x) and (y) both die in the second half of the year 
with (y) dying first, then B occurs but not A. However, both these latter situations are relatively 
rare, involving both lives dying in the same year. In any event, as long as the probabilities of 
these two death situations are roughly the same, they will tend to cancel each other out, and 
we can assume that the probability of A is approximately the same as the probability of B. 

The probability of B can be readily computed from the life table. The probability that 
(x) dies during the year is just g,. Assuming UDD, the probability that (y) will be alive in 
the middle of the year is ; 5p, = 1 — qy /2. Making the usual independence assumption that 
we did in deriving (10.1), we deduce that the probability that both of these occurrences will 
happen is the product of probabilities. In other words, assuming UDD 


1 
hy =q- 54:4 (10.19) 


As a check on this, let us compute another expression for q,,, the probability that at least 
one of the two lives will die within the year. This can happen in two mutually exclusive ways. 
Either (x) dies during the year, with (y) alive at that time, or (y) dies during the year, with (x) 
alive at that time. By basic rules of probability theory we can sum the two probabilities to get 
the probability that either will happen. We must have that 


dy = dL + aiy (10.20) 


Adding the right hand side of (10.19) to that with (x) and (y) interchanged does indeed give 
xy, as shown by (10.2). 
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Consider now an insurance contract where a designated life must die before another in 
order to collect. This arises in the following situation. In any insurance policy, a particular 
individual, known as the beneficiary, is designated to receive the death benefit upon death of 
the insured. If the beneficiary dies before the insured, a new beneficiary is normally chosen. 
Suppose, however, that an insured has no other beneficiary in mind. To handle this, insurers 
are willing to offer a contract that only pays if a stated beneficiary is alive at the death of the 
insured. 

Suppose that a policy sold on the lives (x) and (y) provides for benefits to be paid at the 
end of the year of the death of (x) provided that (y) is alive at the time of such death. If (y) dies 
before (x), nothing is paid. For a death benefit vector b, the present value of the benefits for 
this contract will be designated by AL, b). We calculate this by the standard formula for an 
insurance present value, as given in (5.1) and again in (10.6). We need only insert, for each k, 
the probability that the conditions for payment will occur in the year k to k + 1. Focus on the 
second expression in the right hand side of (10.6) where we wrote the probability that the first 
death of (x) and (y) will occur between time k and time k + 1 as ,p,, q, 4... In the present 
case, in order that the death benefit is paid at time k + 1, we still require that both (x) and (y) 
are alive at time (k) but then we need that (x + k) dies in the next year, and to be the first of 


the two lives to die. In place of q.4¢-)4% we want uH This gives 


N-1 
AL(b) = DY bv + Dips Caney en (10.21) 
k=0 
As a check, we can use (10.20) to see that 


Ay, (b) = AL (b) +A; (b), (10.22) 


which must be true, since a joint-life insurance can be considered as two separate contingent 
insurances, each paying benefits if a particular life dies first. 


10.8.2 Second-death contingent insurances 


Suppose we have a policy where death benefits are paid at the end of the year of death only 
if (x) is the second of the two lives to die. Denote the present value of this by A? ©). We 
calculate this as follows. Think of a regular single-life policy on (x) as two separate contracts. 
The first pays if (x) dies before some other life (y), and the second pays if (x) dies after (y). 
We must have that 


A,(b) = AL (b) + AZ, (b), 
so that 


AZ, (b) = A,(b) - AL). (10.23) 
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10.8.5 Moment-of-death contingent insurances 


Moment-of-death contingent insurances can be calculated by a formula similar to (10.21) but 
with integrals replacing sums, and y replacing q. If AL (b) is the present value of a contract 
paying b(t) at time t provided that (x) dies at time f and (y) is alive that time, then 


N 
Al O) = | D(t)V(2),P yy ui.) dr. (10.24)+ 
0 


We can intuitively verify this by noting that ;p,, ui, (f)df represents the probability that both (x) 
and (y) survive up to time f and then (x) dies at that time. Note also, from (10.12), that 


À,(0) = At, Gb) + Ay), 


which is obviously required. 
For second-death insurance, the same reasoning as in (10.23) shows that 


32 E ql 

Ax, (b) = A,(b) — A,,(b). (10.25) 
Example 10.5 Suppose (x) is subject to a constant force of mortality n, (y) is subject to a 
constant force of mortality v, and the force of interest is a constant ó. Find A s 


Solution. From (10.24), 


oco 
Al f eTe Ure udt = mS 
9 0 H+vt+6 


Normally, we must evaluate contingent insurances directly from the life table. In the case 
that benefits are constant over each year, the procedure is similar to (10.15) except that we get 
only one-half of the remainder term. This necessarily follows from (10.22) and the fact that 
R is symmetric in (x) and (y). That is, assuming UDD for each of the lives (x) and (y), 


ql atado R 
AL w) = Al, (2 * b) +5, (10.26) 


where R is as defined in (10.16). 


10.8.4 General contingent probabilities 


Let sI denote the probability that within s years (x) will die and (y) will be alive at the time of 
the death of (x). (So qi as we have defined it above is just this symbol with s = 1.) Similarly, 
let a denote the probability that within s years (x) will die having been predeceased by (y). 


156 MULTIPLE-LIFE CONTRACTS 


Consider first the case where s is a positive integer n. We can evaluate this easily by adding 
up the probabilities of the required event occurring in each year to obtain 


n-i 


1 1 
nay = 2: KPxylx+k:y+k' (10.27) 
k=0 


From (10.21) this is just AL (I n) at zero interest. For the general duration s we evaluate the 
probability by taking the corresponding continuous insurance formula (10.24) at zero interest: 


S 
sly = J iP yy dt. (10.28) 
0 


Note that we can take s = oo in the above to get the probability that (x) dies before (y). (There 
was no point in doing this in the single life case since ,,g, is just equal to 1). Some basic 
identities are 


1 1 1 2 
Ixy Fo d, =l duy Fo Toy = 1- 


10.9 Duration problems 


Certain multiple-life problems involve a duration that runs from the time of death of an 
individual rather than from time zero. Normally, the best approach to finding present values 
is to refer to our basic formula where we sum or integrate the product of three factors. 
Complications can arise when computing the probability that payment will be made. In many 
cases a formula for this probability will vary with time, and we will need to break up our 
total time interval into appropriate subintervals, each with its own formula. Evaluation of the 
resulting integrals often involves the techniques of changing variables and reversing the order 
of integration or summation. We will illustrate with several examples involving continuous 
annuities or moment of death insurances. For simplicity, we assume throughout constant 
interest and independence of the two lives involved. In all cases we wish to find a formula 
for either a present value or a probability. Moreover we want the resulting formula to involve 
either single-life or joint-life statuses, or contingent probabilities and insurances. 


Example 10.6 An insurance contract provides for 1 to be paid at the moment of the death 
of (y) provided that (y) dies at least n years after the death of (x). Find a formula for the present 
value. 


Solution. We must consider two separate time periods. For a time t between 0 and n there is 
no chance of payment being made. For a time t > n, payment is made provided y dies at time 
t and x was not alive at time t — n. The present value is 


» V Pyh (OL —,., Py Mt. 
n 
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To write this in terms of our standard symbols, we change the variable from t to s = t — n, 
which gives us an integral with lower limit 0. Then the integral becomes 


oo 


/ yr 47 + ny1 =i p4)ds = "ws f V“ Pyan Hy gas) Ts P,)ds. 


In the second equality above, we use the multiplication rule as well as the fact that by definition, 
Lr) depends only on the sum z + r. From the above formula we can write the solution in the 
required form as 


n 1 eH 12 
V nPy (diis x Al mx) =p nPyA gy 


We can deduce this intuitively if we do some fanciful reasoning. We tell (y) to age n years 
while we keep (x) at the same age. We then let the two lives, now age y + n and x, ‘race’ to 
see who dies second. We must multiply the result by an interest factor of v" to account for 
the delay in the race, and also by „p, which is the probability that y will survive to be able to 
participate in the race in the first place. 


Example 10.7 What is the probability that (x) and (y) will die at least n years apart from 
each other? 


Solution. The probability that (y) will die n or more years after the death of (x) is given by 
the solution to Example 10.6 with 0 interest. This is 


1 1 
nPy (1 E q.s.) =n Py cod y: y-en 


Adding to this the probability that (x) dies n or more years after the death of (y) gives the 
solution of 


1 1 
nPy cod yen Tn Px cod yen 


Example 10.8 A temporary life annuity on (y) provides for payments made continuously 
at the annual rate of 1 for n years, but with the payments beginning at the death of (x), rather 
than at time 0. Payments stop at the death of (y). Find a formula for the present value. 


Solution. For 0 < t < n, payment will be made provided (y) is alive and (x) is not. For t > n, 
the payment will be made provided (y) is alive, (x) is not alive but was alive n years before. 
The present value is 


oco 


n oo 
f vpy(l Ut Px)dt + f V Py (t-nPx Ut Px)dt = a,(1,) EE lyy + / v' Py t—-nP At, 
n n 
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where we rearrange and combine terms to get the expression on the right. Using the change 
of variable technique of Example 10.6 on the final term yields the answer of 


= - n E 
a,(1,) — ayy TV nPx4x4n:y- 


Note that this is similar to the usual reversionary annuity formula except (using the same 
fanciful reasoning as in Example 10.6), after n years we only award payment to (y) if living 
jointly with an individual who was age x and was then allowed to age n years while (y) stayed 
the same. 


Another contract of this type, which does not involve a duration directly, but is based on 
a similar theme, is a reversionary annuity where the benefits can depend on the time of death 
of the first person as well as the time they are made. 

To illustrate the technique, we first revisit a continuous version of the standard reversionary 
annuity as given in Example 10.4. Consider a contract with continuous payments at the annual 
rate of c(s) at time s, made if (y) is alive but (x) is not. As an alternate approach to that used 
before, we will view this as a contingent insurance, with a death benefit of a, (c o t) paid at 
time f, the moment of death of (x) if this occurs before the death of (y). Note the time shifting 
necessary here. For example, if (x) dies at time 2, the rate of payment at a time 3 periods after 
the annuity begins is c(5) = (c o 2)(3). We can write the present value as 


i V Dx Dy (D), c o t)dt. (10.29) 
Note now that 
V'Dyày dc ot)= f c(s)v",p,ds, 
t 


since each side is the present value of a t-year deferred life annuity on (y) where the payment 
at time s for s > t is c(s). Substituting this into the integral gives us the present value as 


oo foo} co s 
f c(s)v* p, pu, (t) ds dt = / "i c(s)v Py epu t)dt ds, (10.30) 
0 t 0 0 


by reversing the order of integration. Now, 


s s 
d 

f ‘PxHx(t)ddt =a i -Pt =1 =s Px» 
0 0 t 


the probability that x dies before time s. From this we conclude that the present value of the 
contract is 


[ c(s)v*,p, 1 =s Px)ds = ay(c) B Ayy(C). 


This appears as a very complicated way of arriving at the same form of answer as that 
obtained in a much simpler fashion in Section 10.6. However, one benefit of this insurance 
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approach is that it works just as well with the alternative type of reversionary annuities we 
introduced above, and provides a convenient method for handling those. Suppose, for example 
that upon the death of (x), the periodic rate of payment to a surviving (y) is c(r) at a time r as 
measured from the start of the annuity. The annuity would then begin with the periodic rate of 
c(0). In (10.29) we then have c in place of c o t. For death at time t the rate of payment at a time 
s > t would be c(s — t), and that would appear in 10.30 in place of c(s). This is will normally 
make for a rather complicated integration, but there are cases where it can be worked out, as 
in the following example. 


Example 10.9 Suppose that the force of mortality for x is a constant qi, the force of mortality 
for y is a constant v and the force of interest is a constant 6. Upon the death of (x) an annuity 
will be paid to (y), if surviving, where the rate of payment at a time r after the annuity 
payments begin is e", with 0 € y < v + ô. Find the present value. 


Solution. From the right side of 10.30, substituting c(s — t) = e/*e^' for c(s) as outlined, the 
integral involving the variable t will be 


5 1 — e Gt*Dns 
f e Ve udt = et Maids 
0 Hty 


Substituting and integrating with respect to s gives the present value as 


eo 
H J eee" (] _ ens) ds = H 1 E 1 A 
uty Jo uty|vtó-y vt+ét+u 


*10.10 Applications to annuity credit risk 


Suppose that you are a prospective annuity purchaser (or an advisor to such), and have a list 
of various companies and the premiums they charge. The prices will vary, due to different 
interest and mortality assumptions, as well as a different treatment of many other factors that 
go into gross premium calculation. A natural tendency would be to select the one with the 
lowest premium, but that is not necessarily the best choice. This ignores credit risk which 
is the risk that the company may be unable to meet the promised payments due to financial 
difficulties. One may want to adjust the prices to reflect the different degrees of this risk 
among the various issuers. This could be an extensive task, and we will not go into details 
here. Our goal in this section is to indicate briefly how the theory of multiple-life contracts 
can be applied towards this endeavour. 

A first step might be to compare for a fixed interest and mortality basis, the cost of a 
life annuity calculated in the usual way with one that provides for payments only if a chosen 
company is viable. For this purpose we just treat the company as another individual, which 
we'll denote by y, which is a survival state if it is in position to meet its financial obligations. 
We need to model ,p,, which is the probability that the company is in a survival state at 
time f. This will produce the associated force of failure, 4, (t) = —d /dtlog ;p,. We can then 
consider two-life contracts based on (x) and y. For simplicity we confine our discussion to a 
life annuity on (x) that provides continuous payments at a level annual rate of K. 
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Taking into account the credit risk, the worth of the contract to the purchaser should be the 
joint-life annuity Ka,, = K i ‘Px Py dt rather than Ka,. This assumes that no benefits are 
paid after failure of the company. In practice however, there is often a recovery rate r whereby 
the company, upon financial failure, will continue to pay an income of r for every 1 unit of 
promised annuity income. In this case the purchaser will get Kr if (x) only is surviving, and 
K if both x and y are surviving so that our general two-life annuity formula gives the value to 
the purchaser as 


K|ra, + (1 — r)ayyI, 


which of course reduces to Ka, when r = 1. 

Another complication arises in cases where government regulatory bodies provide guar- 
antees for annuities. There are many possible options concerning the amount of the guarantees 
and how they relate to the recovery rates of the failing company. We will consider the follow- 
ing particular example, which is encountered in some jurisdictions. Upon failure, the insurer 
continues to pay at their recovery rate. The guarantor makes up the difference, but the value, 
at interest and survivorship, at the time of failure, of all payments made by the guarantor is 
limited to an amount G. Suppose, for example that K = 30,000, G = 100, 000, r = 0.4. In the 
event of default, the issuing company would pay 12,000 each year, and the government body 
would provide the annual shortfall of 18,000 but would only pay it only until the present value 
of 100,000 was exhausted, which should be somewhat over 6 years under normal interest and 
mortality bases. We will arrive at a formula to compute the present value of a K-unit, whole 
life annuity subject to these default arrangements. We assume that the mortality and interest 
bases are the same for the issuer and the guaranteeing party. 

Let sọ = inf {s : a,,, € G/(1 — r)K}. Note that sy would equal 0 if a, < G/(1 — r)K. For 
failure of the issuer at a time ¢ before time sp the guarantee will not cover the full annuity. 
In that case the annuitant, who, in the absence of default, would have received total future 
payments with a value at time t of Ka,,,, will instead receive payments with a reduced value 
at time t of G+ rKa,,,. The deficit is (1 — r)Ka,,; — G. We must subtract from the usual 
present value, the present value of a a contingent insurance that provides this deficit on the 
failure of y before time sọ, provided that (x) is then alive. Referring to equation (10.24) the 
credit risk adjusted present value of the annuity is 


SO 
Ka, — f Vp, Py H, Ol - r)Ka,,, — G]dt. 
0 


10.11 Standard notation and terminology 


Particular examples of the multiple-life notation have already been encountered in the standard 
notation for term and endowment insurances and temporary annuities. In place of the life (y), 
we dealt with a period of n years, denoted by rl, which ‘survives’ for exactly n years and 
then ‘fails’ at time n. So the standard symbol for the present value of a l-unit, n-year term 
insurance premium, A does indeed signify that the policy provides for 1 unit at the end of 
the year of death provided (x) dies before the ‘failure’ of ri. Similarly, the standard symbol 
for the present value of a 1-unit, n-year endowment insurance A,.,, indicates that the contract 
provides for 1 unit to be paid at the death of (x) or at time n if earlier, that is, on the first 
failure of x and n. The 1-unit, n-year, temporary annuity present value symbol G,.,, indicates 
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that the contract provides for payments to continue as long as both (x) is alive and nlis ‘alive’, 
meaning that the n-year period has not run out. 


10.12 Spreadsheet applications 


We can adapt the Chapter 6 spreadsheet to be applicable to joint-life annuities and insurances. 
In the first place we will make provision to insert two life tables, one for (x), and one for (y). 
This is common in practice as often one life will be male, and the other female, so different 
tables apply. Start by inserting a new column D, which we intend show the values of q)4,. 
This will move all the existing columns over one to the right. In cell D1 we put the age y. 
Copy the formula from C10 to D10. It should be the same except with C1 replaced by D1 
and a reference to column P. (the reference is column C will be automatically changed from 
column N to column O). 

We then can insert a new table in Column P by putting the parameters in P3 and P4, coping 
the formula from 010 to P10 and copying down. 

In column F, which now contains y,,, we change the formula in F11 by appending 
x(1 — D10) and copy down. 

In Column M, which now contains w,, * b, we change the formula in M10 by replacing 
C10 by (C10 + D10 — C10*D10) in order to multiply by q,y rather that q,. 

As a test exercise suppose that our example table applies to females and that for the male 
table the mortality at each age up to 119 is 1.25 times the female rate. Interest rates are a 
constant 6%. An insurance on a male age 50 and a female age 45 provides 10,000 at the end 
of the year of the first death. Find the net level premium payable yearly while both lives are 
living. The answer is 208.75. 


Notes and references 


An extensive study of dependence in two-life annuity contracts can be found in Frees ef al. 
(1996). 

More advanced examples of multiple-life contracts, involving an arbitrary number of lives, 
can be found in Jordan (1967, Part II), or in Bowers et al. (1997, Chapter 18). 

The result from real analysis that justifies the reversal of integration in formula (10.30) 
can be found in Royden (1988) Theorem 12.19 (iii). 


Exercises 


In all the exercises for this chapter, we assume that (10.1) holds. 


Type A exercises 


10.1 You are given the following information from a life table: fgg = 100, 7g, = 80, Z5, = 
40, 233 = 20. You also have the vector c = (1000, 1200, 1500) and the vector b = 
(1000, 2000). The interest rate is a constant 25%. Find (a) pgo.s1 and 2P80:81> (b) Paz 
and 5pacsr. (€) äs0:81 (€). (d) Ago.g1 (b). (6) ägg) (£) Agar (b). 
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10.2 


10.3 


10.4 


10.5 


10.6 


10.7 
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Suppose that sp, = 0.8, gp, = 0.7, sp, = 0.6, gp, = 0.4. Find the probability that 


both (at least one) (exactly one) (neither) of the two lives 
live 5 years (die within the next 5 years) (die between time 5 and time 6). 


There are 12 answers required here. 


You are given that p7q = 0.9, »p79 = 0.8, 3p79 = 0.7. The interest rate is 20% for the 
first year and 25% for the second year. Find the present value of the benefits on a 
2-year term insurance policy sold to two lives (70) and (71) that provides benefits 
upon the first death, payable at the end of the year of death. The amount of the death 
benefit is 1000 if the first death occurs in the first year, and 2000 if it occurs in the 
second year. 


An insurance policy pays 1 unit at the moment of the second death of (x) and (y). 
Level premiums are payable continuously as long as either of the two is alive. Given 
that (x) is subject to a constant force of mortality of 0.1, (y) is subject to a constant 
force of mortality of (0.3) and the force of interest is a constant 0.1, what is the annual 
rate of premium payment? 


For two lives (x) and (y), the forces of mortality are given by y,(t) = 0.04 for all t, 
and p(t) = 1/(20 — t) for 0 < t < 20. 


(a) Find the probability that the joint-life status (xy) will survive to time 10. 
(b) Find the probability that the last-survivor status xy will fail before time 10. 


(c) An insurance contract provides for a death benefit paid at the moment of the first 
death of (x) and (y). The amount of the benefit is eO-!! for death at time t. The force 
of interest is a constant 0.06. Find the present value of the benefits. 


Suppose that Demoivre's law holds with œ = 100. Consider two lives (80) and (60). 
(a) What is the probability that the first death will occur between time 5 and time 10? 


(b) What is the probability that the second death will occur between time 5 and 
time 10? 


The following is a portion of a select and ultimate table with a select period of 2 years. 


x dix Qoa Q2 x+2 
60 0.08 0.14 0.22 62 
61 0.09 0.20 0.25 63 


62 0.10 0.20 0.30 64 


A and B are both age 62. A is a newly selected life, while B was first selected at age 
61. What is the probability that the second death of A and B will occur between 2 and 
3 years from now? 


10.8 


10.9 


10.10 


10.11 
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An insurance contract sold to (x) and (y) provides for a death benefit of 1 unit at the 
moment of the first death and 3 units at the moment of the second death. Annual 
premiums are payable while at least one of the two is living. The annual rate of 
premium payment reduces to three quarters of the initial rate upon the first death. You 
are given that (x) is subject to a constant force of mortality of 0.2, and (y) is subject 
to a constant force of mortality of 0.3. The force of interest is a constant 0.1. Find the 
initial annual rate of premium payment. 


A contract on two lives (x) and (y) provides that if (y) dies first, (x) will receive a life 
annuity of 1 per year, starting at the end of the year of (y)'s death, while if (x) dies 
first, (y) will receive a life annuity of 2 per year, starting at the end of the year of (x)’s 
death. Level annual premiums of z are payable while both (x) and (y) are alive. Given 
that à, = 16, ay = 13 and ayy = 10, find z. 


You are given two independent lives (x) and (y) for which (x) is subject to a constant 
force of mortality of log(4/3), while (y) is subject to a force of mortality at time t of 
1/(10 — f) for0 € t < 10. 


(a) What is the probability that the first death among these two will occur between 
time 2 and time 3? 


(b) What is the probability that the second death among these two will occur between 
time 2 and time 3? 


You are given that qzy = 0.2, q;; = 0.25, qq = 0.4, q73 = 0.6. The interest rate is a 
constant 25%. The vector b is (1000, 2000, 3000). Compute (a) AI, .., (b). (b) AJ, ,, (b), 


70:7 71:7 
(c) A354, ). (0) A7, a9 (b). 


Type B exercises 


10.12 


10.13 


10.14 


10.15 


An annuity contract issued to two lives, (40) and (50), provides for payments of 1 per 
year, to be made provided that either or both of the following conditions hold. 


(i) (40) is alive and under age 60; 
(ii) (50) is alive and over age 60. 


Find a formula for the present value of the benefits using terms of the form à,(c) 
Or à, (c). 


Repeat Exercise 10.12, but now with (ii) stating that (50) is alive and between age 60 
and age 80. 


An insurance policy sold to (x) and (y) provides for a payment of 2 units on the first 
death and 3 units on the second death. Level annual premiums are payable for as along 
as either individual is living, with the premiums reducing by one-third upon the first 
death. Find an expression for the initial premium in terms of A,,A,,A ää, and 


as XY » 
a 


- 
An annuity sold to (x) and (y) provides for annual payments for 20 years, provided at 
least one of the two is alive. The annual payment begins at 12 units. During the first 
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10.16 


10.17 
10.18 


10.19 
10.20 


10.21 


10.22 


10.23 


10.24 
10.25 
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10 years, the payments remain at 12. However, during the second 10-year period, the 
payment reduces to 6 if (x) only is alive, or to 8 if (y) only is alive. Write the present 
value in terms of single- and joint-life annuities. 


An annuity contract sold to (40) and (50) provides for yearly payments if either life 
is living and under age 70. The yearly payment is 6 if both are alive, 4 if (40) only is 
alive, and 5 if (50) only is alive. Find a formula for the present value of the benefits 
using terms of the form à, (c) and à, (c). 


For the policy of Exercise 10.8, calculate ,V. 


Suppose that Demoivre's law holds. Show that (10.19) holds exactly for a general left 
subscript n in place of 1. That is, 


; 1 
niyy = ndx — 5ndx nly: 


For the two lives of Exercise 10.10, find the probability that (x) will die before (y). 


Given two lives (x) and (y) and times s < t, consider the following four events: A, the 
joint-status (xy) fails between time s and time t; B, the last-survivor status (xy) fails 
between time s and time f; C, at least one of (x) and (y) fails between time s and time 
t; and D, both (x) and (y) fail between time s and time t. Rank the probabilities of 
these from largest to smallest to the extent that you are able, showing clearly those 
events that you cannot compare without further information. 


Suppose that interest is a constant 25% and that g, = 0.2. Assume UDD. Find the 
ratio of A, (1;) to (i/8)A (11). 

The probability nad}, can be generalized to ,, A ee which is the probability that 
out of a group of k lives, (x1). (Xa), ..., Œ), (x1) will die within n years and be the first 
to die. Express the following in terms of first-death probabilities: 


(a) the probability that (x) will die within n years and will be the second to die among 
a group of three lives (x), (y), and (z); 


(b) the probability that (x) will die within n years and be the third to die among (x), (y), 
and (z). 


Find a formula to calculate A? (b) in terms of 
(a) ÀL (b), À, (b), À,, (b). 


(b) A2, (b), Az; (b). 


Suppose that „g, = 0.2, and „q, = 0.3. What is aa — ny? 


An insurance pays 1 at the death of (y) provided that (x) was alive n years before this 
death. Assume constant interest. Find a formula for the present value in two ways. 
First by setting up and evaluating the necessary integrals. Secondly, by using Example 
10.6 to avoid any integration. 


10.26 


10.27 


10.28 


10.29 


10.30 
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An annuity provides for continuous payments at the annual rate of 1, while both (x) 
and (y) are alive, and in addition continues the payments for n more years after the 
first death, as long as the survivor is living. Payments stop completely upon the second 
death. Find a formula for the present value. Verify that your answer gives the correct 
result in the case that n = 0. 


In Example 10.9 take v = 6 = O and y < 0. 
(a) Evaluate the present value and give an intuitive explanation of the answer. 


(b) Take the parameters as in part (a), but now consider the simpler type of reversionary 
annuity where the rate of payment at time r (measured as usual from 0) is e". Show 
that the present value is an increasing function of u and explain why this is so. 


In this problem all death benefits are payable at the moment of death, and all premiums 
are payable continuously at a levelrate. A life insurance contract on (x) and (y) provides 
for death benefits as follows. If (x) dies first at time t, the amount paid is the reserve 
at time ¢ on a 1-unit whole life insurance on (y) with premiums payable for life. If 
(y) dies first at time ft, the amount paid is the reserve at time f for a 1-unit whole life 
insurance on (x) with premiums payable for life. Premiums are payable until the first 
death. Show that the rate of premium payment is P, + P, — P,,, where P, denotes 
the rate of premium payment for a 1-unit whole life insurance on x with premiums 
payable for life, and P and P denote similar quantities for y and xy, respectively. 
You should demonstrate this mathematically and also give a verbal explanation. 


A contract on (x) and (y) provides the following death benefits, payable at the 
moment of death. If (x) dies first, 10 is paid upon the death of (x) and 30 is paid 
upon the death of (y). If (y) dies first, 20 is paid upon the death of (x) and 40 is paid 
upon the death of (y). Find the present value given that 


À À À jl 
A, = 0.30, A, = 0.45, Axy = 050, A, = 0.20. 
An annuity provides continuous payments at the annual rate of 1 to (y) while living, 


beginning n years after the death of (x). Assuming constant interest, find a formula 
for the present value. 


Spreadsheet exercise 


10.31 


Suppose that our sample table applies to females and the mortality rate for males is 
1.25 times that for females up to age 119. A is a male age 50 and B is a female age 
40. The interest rate is a constant 596. 


(a) Find the level annual premium for an insurance paying 10 000 at the end of the 
year of first death of A and B with premiums payable while both are alive. 


(b) Find the level annual premium for an insurance paying 10 000 at the end of the 
year of the death of A provided this occurs before the death of B. Premiums are 
payable while both are alive. 


(c) Assuming UDD, find the probability that A will die before B. 


11 


Multiple-decrement theory 


11.1 Introduction 


Our discussion of insurance contracts up to now has been concerned with benefits payable 
upon the occurrence of death. We now investigate situations when an insured is at the same 
time subject to several different events that can have financial impact. 

One example, which we mentioned earlier, is the event of withdrawal or lapse, whereby 
an insured life terminates the contract and receives a cash value. To properly model a life 
insurance contract, one must consider two causes of termination, death and withdrawal, 
operating simultaneously. 

Sometimes the insurer must distinguish between different causes of death. For example, 
some policies have a feature that provides additional death benefits for accidental death as 
opposed to death from natural causes. 

Some policies include disability benefits, providing income for people who can no longer 
work. For such, the insurer must consider the possibility of disability, as well as death and 
withdrawal. 

For employees covered under a pension plan, there are at least four events of interest: 
disability; death; termination of employment; and retirement. 

These are just a few of the possibilities, and many more can arise. In this chapter we 
analyze the general situation. 


11.2 The basic model 


We suppose that we have m different causes of failure operating simultaneously on a group 
of lives. These causes are often referred to as decrements, since they bring about a decrease 
in the number of lives under observation. The insurance policies introduced in Chapter 5 
involved only one decrement, namely that of death. When there are several decrements we 
are concerned with what is called in the actuarial literature as multiple-decrement theory, or, 
in biostatistical contexts, the theory of competing risks. 
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There is a generalization of these ideas to multi-state insurances and annuities, where 
individuals can transfer freely between one of several states. We deal with this in Chapter 19, 
which requires a basic knowledge of Markov chains, as given in Chapter 18. 


11.2.1 The multiple-decrement table 


It is convenient to make use of a generalized life table, known as a multiple-decrement table. 
We will number our causes from 1 to m and use a superscript (j) to refer to cause j. We will 
also use a superscript (7), which the reader can interpret as meaning total. We begin with 
an arbitrary number 7 x. of lives age 0. We let 7 (c) denote the number of individuals from 
this group who are still surviving at age x. That is, they have not succumbed to any of the m 
causes. We let d? denote the number of lives who will fail first from cause (j) between the 
ages of x and x + 1. Let 


m 
gc qu (11.1) 
jl 


which is the number of people who will fail from some cause between the ages of x and x + 1. 
It follows that 


© 20). 40 
Calne, d. (11.2) 
That is, the number of survivors at age x + 1 is equal to the number of survivors at age x, less 


those who failed from some cause between the ages of x and x + 1. The following is a portion 
of a sample table with two decrements: 


x p» at de 
0 1000 50 100 
1 850 60 105 
2 685 70 120 


It is important to note the word ‘first’ used in the definition of d®. The model assumes that 
any cause of failure results in the individual leaving the group, so they are no longer under 
observation. For example, if cause 1 denotes death and cause 2 withdrawal, a policyholder 
who withdraws at age 605 and then dies at age 602 would be included in ae. but not in do : 
This is what we want in the common applications. In the case of withdrawal, the insurer would 
pay the policyholder their cash value, the policy would terminate, and the insurer would have 
no further interest in the time of death of this person. Therefore, whenever we refer to failing 
from a certain cause j in the multiple-decrement model, it is understood that this means that 
this occurs before failure from any other cause. 

We defined a multiple-decrement table starting at age 0, but this could be some other age, 
depending upon the particular application. For example, the multiple-decrement table for an 
employee pension plan would begin at the first age the employees become eligible for the 
plan, perhaps age 25 or so. 
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Our model assumes that the multiple-decrement table, as given, can be used for all ages 
x. That is, it is an aggregate table that does not show selection effects. We could define for 
each x, as we did for the single-decrement case, a select multiple-decrement table to apply to 
people first observed at age x. This is important in practice since many causes of decrement, 
such as withdrawal, will certainly depend more on duration since the policy began than 
on attained age. However, we will not get into these details in this chapter and will work 
with the aggregate model. The more general case can be deduced from the material of 
Chapter 17. 


11.2.2 Quantities calculated from the multiple-decrement table 
We first define probabilities of failure. Let 


be the probability that (x) will fail first from cause j within 1 year. In practice, one would 
start with these probabilities and construct the multiple-decrement table inductively, by 
calculating 


d) = Cg 


and then using (11.1) and (11.2) to complete the table. Let 


wy = — (11.3) 


be the probability that (x) will survive to age x + k without succumbing to any cause. 
Probabilities are calculated from the table with similar formulas as in the single-cause life 
table. One must only remember that the denominators are 7 o For example, 


n-l Kp 
(i) | —k-0 do 
ndx = zo 


qo 
x*k || (2), O 
P =P, Desk (11.4) 


is the probability that (x) will fail from cause j in the time interval k to k + 1. 
Another quantity of interest is 


@-x-1 


O (j) 
ony d. 
k=0 
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This represents the number of people in our group who will fail from cause j sometime after 
age x. We assume in our model that everyone will eventually fail from some cause (which is 
obvious if death is one of the causes) so we can write 


m 
-5 2 
d Dit ade 
j=l 


Knowing the value of 7 G ) for all integral values of x and all j allows us to complete the table 
since 


Note that we can write 


Q _ 0 
y ty oC 
qe = P e Go. (11.5) 
£r 
Another symbol we can define is 
IU) 
mc 
pe =e (11.6) 
Ü 


which is the probability that (x) will fail first from cause j after time n. Here, instead of 
thinking of the basic symbol ,p as denoting survival to time n, we think of the equivalent 
formulation as failing after time n. 


11.3 Insurances 


Consider an insurance policy on (x) that provides, for all relevant values k and j, a payment of 
d at time k + 1, provided that failure occurs from cause j in the year k tok + 1. We can view 
this as m separate policies, where the jth policy pays benefits only for failure from cause j. 
The formulas are exactly the same as (5.1), but with the probabilities taken from (11.4). That 


is, the present value of the jth policy will be 


Qd 
d 

Y Ps p i = Y bvk + Dau. ALD 

k=0 A k=0 


and the total present value would be obtained by summing over all j. 
To handle insurances payable at the moment of failure we will need the following: 


Definition 11.1 For each cause j, let 
q? ee eo 


"o hix dim © x+h 
(x) = lim * z - lim S (11.8) 
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The force of decrement at time t, from cause (j) for (x) is the quantity given by 
HP) = iO 0. (11.9) 
(In the case of selected multiple-decrement tables we would need to define no (t) separately 
for each age x.) 


Analogously to (8.15) and noting that the quantities in (11.5) and (11.6) differ by a constant 
we can write 


-d (j) d j) 
o) 2221 fq 
Hu; (t) = = : (11.10) 
(7) (7) 
iPx Px 


Consider the continuous analogue of the insurance whose present value is given by (11.7). 
This is a contract that pays a death benefit of pr) at time t, should failure from cause j occur 
at that time. We denote the present value by APO). It is given by 


AP?) = " : bY Qv(0,pO uy at. GLIDE 
0 


This reflects the fact that in order to collect at time f, (x) must survive all causes up to time f, 
and then succumb to cause j at time t. We discuss the evaluation of this integral in Section 11.5. 

For policies which are purchased by annual premiums, payable until the first failure, we 
determine the initial premium as in the single decrement case, by simply dividing the present 
value by 


co 


à? (p) = Y vp, ww, 
k=0 


where p is the premium pattern vector. 


11.4 Determining the model from the forces of decrement 


The multiple-decrement model is often given from the outset by specifying the forces of 
decrement, and we must use these to calculate probabilities. Note that in the last expression 
in (11.8) the numerator involves 2” while the denominator involves 7). This means that we 
cannot apply a version of (8.16) directly. We can, however, define the total force 


m 5 £t ho eO " 
Orn — Dyk eras x xt 
HOO = Y ua = him T— c. 
j=l he x+t 


So ng is the same type of quantity as the single-life force of mortality introduced in Chapter 8, 
except based on the total decrement rather than just on failure by death. We can now apply 
(8.16) to deduce that 


t (T) 
pO = e7 Jo ne dr (11.12)} 
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Arguing as we did in deriving (10.28), apply (11.11) with a constant interest rate of 0, and 
b(t) = 1 for 0 < t < s, and 0 elsewhere, to conclude that 


E s E 
= J pO uq dt. (11.13) 
0 


Example 11.1 In a model with two decrements, you are given that uPA = 0.02 and 
HOW) = 0.03 for all t > 0. Find ade: 


Solution. u® = 0.05, so that p = e~®- and, from (11.13), 


2 3 
T a. 


11.5 The analogy with joint-life statuses 


We have already encountered a multiple-decrement model in Chapter 10. The difference is 
that, rather than single lives, the basic objects were joint-life statuses consisting of a pair of 
lives (x) and (y). These pairs were subject to two distinct causes of failure, namely the death 
of (x) and the death of (y). The notation differs somewhat, but the reader should notice that, 
with this point of view, the contingent insurance present value given in (10.24) is a special 
case of that given in (11.11), and the probability in (10.28) is a special case of that in (11.13). 

It follows that if we assume that the causes of decrement are acting independently, we 
can evaluate the integral (11.11) from the multiple-decrement table as in Chapter 10. That 
is, if the benefits are constant over each year, and the particular decrement can be assumed 
to be uniformly distributed over each year, the i/ó correction will result in a reasonable 
approximation. 


11.6 A machine analogy 


The difficult part of multiple-decrement theory deals with relationships between the different 
decrements. In order to better understand and motivate the ideas, we first look at what appears 
to be a different situation, but which we will eventually relate to our model above. To facilitate 
this, we will use notation for this new example that parallels that which we have already 
introduced. 

Suppose we have a machine with two components, part 1 and part 2, which work com- 
pletely independently of each other so that the condition of one part does not affect the 
operation of the other. In order for the machine to work, both parts must be working, so if 
either part fails, the machine will fail, even though the other part may be in perfect order. 
We assume also that both parts cannot fail simultaneously, so we can always identify which 
part caused the machine to fail. Suppose we want to compute probabilities of failure over 
some time period, say a year. We have four quantities of interest. For j = 1,2, let q' be the 
probability that part j will fail, and let g be the probability that the machine will fail due 
to the failure of part j, meaning that part j failed during the year and was the first of the two 
parts to fail. What are the relationships between these? Since the failure of the machine due to 


172 MULTIPLE-DECREMENT THEORY 


the failure of part j obviously implies that part j failed, it is clear from elementary probability 
theory that 


di? > qo. (11.14) 


It is is also clear that, in general, the two quantities are not equal and we would expect the left 
hand side to be strictly greater than the right. Suppose, for example, that sometime during the 
year part | fails, causing the machine to fail, and sometime after that but before the end of 
the year, part 2 fails. Then the event of the failure of part 2 would have occurred, but not the 
event of part 2 causing the failure of the machine. 

After introducing some additional notation, we will state some further relationships. Let 
p'? = 1 — q'Ü denote the probability that part j will be working at the end of the year, q(? 
denote the probability that the machine will fail within the year, and p™ = 1 — q(? denote 
the probability that the machine is working at the end of the year. The machine can fail in one 
of two mutually exclusive ways, namely, failure of part 1 or failure of part 2. By elementary 
probability theory we have 


qO =q +42. 


On the other hand, in order that the machine be working at the end of the period, both parts must 
be working. Since the parts work independently, we can multiply probabilities to calculate 
this (as we illustrated in Chapter 10), which gives 


DD, 


p? =p'®p 


From the above two equations we get the fundamental identity 

gh +4 q? =1 — p(Dp'O), (11.15) 
which is often written in the form 

qP + 4D = qO 4. 40 — qq. (11.16) 
The basic problem of interest is as follows. If we are given the unprimed symbols, can we 
calculate the primed ones, or if we are given the primed symbols can we calculate the unprimed 
ones? In general, we cannot do this uniquely and we will need additional information in order 
to definitely deduce one set of probabilities from the other. We will, however, present some 


possible solutions to this problem, and then discuss some conditions that will lead to these 
solutions. 


11.6.1 Method 1 


Given q”, j = 1,2, we sum to get g, then calculate p, and then let 


(5) 
p 2 p(ON49? J. (11.17) 
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which will clearly satisfy (11.15). It is not too difficult to remember this formula. We have 
to factor the quantity p into two factors, and it is natural to take the factors as p® to some 
exponent where the exponents sum up to one. Two numbers that obviously sum to 1 are 
q? /q™, j = 1,2. 

For the other direction, given q'®, j = 1,2, we calculate p), multiply to get p™, then 
calculate q, and finally take 


K) 
p lose” 49 
log pO 


(11.18) 


For this formula we have to split up g as the sum of two terms. We obtain the terms by 
multiplying by weights that add up to 1. Since we have a multiplicative relationship, we take 
logs to accomplish this. 

A straightforward calculation verifies that this approach is consistent. That is, if we are 
given q”, j = 1,2, and use (11.17) to calculate q', j = 1,2, and then apply (11.18) to these 
numbers, we will end up with the same values of g as we started with. 

To see that (11.14) is satisfied we use the inequality 


(l-x)*<(l-ax), for0<x<1,0<a<1l, 


which can be demonstrated by looking at the the Taylor polynomial of degree 2 for the left 
hand side. From (11.17), taking a = gq? /q®, 


p? =(1-q")* <1-ag® =1-q®, 


and (11.14) follows. 
The entire discussion above can be generalized to the case of m independent parts, where 
the failure of any one part will cause the machine to fail. Use notation as above, except that 


j now takes values 1,2, ...,m. Everything is the same with the obvious changes to handle m 
probabilities instead of two. For example, (11.15) will now read 
m m 
$ -21-[[»9. (11.19) 
j=l j=l 


and the solutions given by (11.17) and (11.18) will satisfy this relationship. 


11.6.2 Method 2 


We present another method to deduce the unprimed probabilities from the primed, which in 
certain cases may be more realistic than formula (11.18). However, for m > 2 it is not easy to 
invert, in order to deduce unprimed from primed. To motivate this method, we look again at 
the case of the joint-life statuses of Chapter 10. In fact we can directly visualize the machine 
point of view in this situation. The ‘machine’ consists of the joint-life status (xy) with the two 
distinct components (x) and (y). For our particular problem, suppose we are given q'? and 
q'Ó), and want to compute the probability that the failure of part 1 caused the machine to fail. 
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Reasoning as we did in deriving (10.19), we approximate this by computing the probability 
that part 1 failed during the year and that at the middle of the year part 2 was still working. 
Making an assumption of a uniform distribution of failures for each part, over each year, 
which is analogous to our UDD assumption, we have, as in (10.19), 


rags sag, q? - 4'0 - sag, (11.20) 


which clearly satisfy (11.14) and (11.16). 
To calculate the inverse formula, let A = qP — qo. From (11.20) 


1 2T 
g) ec gi) e 2 (4) Ts 54^ 


We have a quadratic equation in g/“), which can be solved by the usual formula to yield 


JO = CTA 7 VQ e AY = 84 


2 


(11.21) 


> 


and directly from (11.20) 


= dM — A, 


Example 11.2. Given q% = 0.27, q® = 0.17, find q’ and q' by Method 2. 


Solution. 


2.1 - (2.1)? - 2.16 
{Os A C WAIT cited qe deo 


2 


As a check we can take these answers and apply (11.20) to get back to where we started. 
q€» 203—(1/2.06 2027, 4» 2 0.17. 
Example 11.3 Given q’ = 0.200, g'® = 0.488, estimate g‘ and q(? by both methods. 


Solution. By Method 1: Since p? = 0.8 and p'® = 0.512, we have g™ = 0.5904 and, from 
(11.18), 


q” = T x 0.5904 = 0.1476, q? = ; x 0.5904 — 0.4428. 


Solution by Method 2: From (11.20), 


q™® 20.2000 — 0.0488 20.1512, — q? = 0.4880 — 0.0488 = 0.4392. 
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As with Method 1, Method 2 can also be extended to the case of m components to deduce 
the unprimed from the primed, as we show later in Section 11.7.2. For example, with m = 3 
we get 


q 5 q qd q 


1 1 
(D 2 qX L [gi Og + qr Oq0] + 5” 12) 13). (11.22) 
Note that the coefficients of 1/2 and 1/3 are what is needed in order to satisfy (11.19). 
With m > 2 there is no convenient direct method for the inverse process, and it would 
have to be done by numerical approximation. 


11.7 Associated single-decrement tables 


11.7.1 The main methods 


We now return to the original setting dealing with a group of lives, to which we will apply 
our ‘machine’ model. For a definite example start with a multiple-decrement table with two 
decrements; cause | is death, and cause 2 is disability. Suppose we wish to use this table to 
construct a regular single-life table relating only to death, as introduced in Chapter 3. Would 
we be justified in simply taking gi? as the mortality rate for age (x), in the single-life table? The 
answer is a definite ‘no’. We stress again that gi? is not the probability that (x) will die within 
the year, but rather the probability that (x) will die within the year before becoming disabled. 
The actual value of q, should be larger than gq. The situation is the same as the machine model 
in the previous section. A person could die during the year, after having already left the group 
by reason of disability. In other words, the mortality rate for age (x) in the single decrement 


table is analogous to the primed rate in the machine model, and will be denoted by Gi. 


We may also want to compute q, r2) , the probability that (x) will become disabled during 
the year, assuming that no other causes of failure are operating. This admittedly requires a 
stretch of the imagination, as the possibility of death would always appear to be present. We 
have to imagine, however, that we are computing what these probabilities would be if we 
could somehow eliminate the possibility of death. The best approach is simply to view things 
in terms of the machine model. That is, we have to think of a person as having a ‘disability 
component’ that exists independently and could i n fact continue to operate even after death. 

For each cause j, the collection of values (qU for various values of x is known as the 
associated single-decrement table for cause j. As we have stressed above, these rates give 
probabilities of failure for the particular cause j, assuming that no other causes of decrement 
are operating. 

The problem often arises of going from one set of rates to another. We might, for example, 
construct the multiple-decrement table in the first place by using our knowledge of the 
associated single-decrement tables. Alternatively, we might have first constructed a multiple 
table by actually observing the effect of the various causes acting together and wish to use that 
to construct the associated single-decrement tables. In order to do so we can directly apply 
our machine model and use the two given methods, provided we make the key assumption of 
that model, namely that the various causes are acting independently. One can certainly argue 
in many applications that this does not hold, but the standard actuarial model incorporates 
this independence assumption. The more general theory is discussed in Chapter 17. 
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Consider a numerical example. Suppose that in our death—disability table 
T y D» 2). 
£O = 1000,  q? 2300, — 4? = 100. 


We will compute the associated single-decrement rates using method 1. From formula (11.17), 
we have 

gi) = 1 — 0.6/4 = 0.318, 

di = 1 — 0.6!/* = 0.120. 

Suppose we are given the above single-decrement rates and want to compute the multiple- 
decrement table. If we use the Method 1 formula (11.18) we necessarily will get back to 
unprimed rates of 0.300 and 0.100 that we started with, as the reader should verify. If we use 
the Method 2 formula (11.20) instead, we will get 

q® = 0.318(1 — 0.06) = 0.299, 
q? = 0.120(1 — 0.159) = 0.101, 


verifying as we saw in Example 11.3 that the two methods are different, but will generally 
give answers that are close. 


11.7.2 Forces of decrement in the associated single-decrement tables 


Refer back to the statement made in Section 11.4 that the multiple-decrement model is often 
constructed from the forces of decrement. A question that naturally arises is, how are these 
forces of decrement obtained? The answer is that they can be taken as the forces of decrement 
in the associated single-decrement tables, namely 


IG qi iPx 
uq) - - 2 TUN, 
Px 


Note that u P (r) is defined differently than the quantity no A as given in (11.9) but 
there is reason to believe that they might be the same. We observed above that we expect that 
g > q? since the person may fail within the year from cause j after first failing from another 
cause, but this argument does not hold when we are speaking of instantaneous rates of failure 
rather than failure over a period of positive length. This by itself is not sufficient for equality 
— see Exercise 11.15 — and we need the independence hypothesis. In Section 17.4.5, we give 
a formal proof and provide an explanation of the fact that by virtue of independence, 


KOO = ufq. (11.23) 


We can now summarize the main conclusions which follow from our independence 
hypothesis. 


* Given the forces of decrement in the associated single-decrement tables, we can 
uniquely determine the multiple-decrement model, using (11.23) together with the 
procedure of Section 11.4. 
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e Given only the rates of decrement in the associated single-decrement tables, we cannot 
uniquely determine the multiple-decrement model, and must choose from different 
methods, the most notable being the two described in Section 11.7.1. 


11.7.3 Conditions justifying the two methods 


In this section we give conditions to justify the Method | and Method 2 introduced above, 
and also some additional procedures that apply in special cases. We start with the easier case. 


Method 2 This will hold provided we have a uniform distribution of decrements in each of 


the associated single-decrement tables. That is, for anintegerx,0 < t < l, andj = 1,2,...,m, 
hes (11.24) 


To show this, we first note that as in (8.16) 
p = e fH war, (11.25) 

and then, invoking (11.23) and applying (8.20) 
Pro HO = PO MPO = a”, (11.26) 


From (11.13) 


1 
“= POL =, dw (ode. 
0 


Now substitute from (11.24) and (11.26) and integrate to obtain the Method 2 formula. The 
same procedure works for the general case of m decrements. We just need to include additional 


factors of (1 — tq.) in the integrand. 


Method 1 We need a preliminary definition. We say that the Uniform Ratio Hypothesis 
(UR) holds at an integer (x) if there are constants Kj forj = 1,2, ... m such that for 0 < t < 1, 


pÝ ( = 
PO 


j 


As an example in the two decrement cases, let un) = uPA) for 0 < t < 1. Then UR holds 
at x with K, = 2/3, K, = 1/3. 
Suppose now that UR holds at x. From (11.13) with s = 1, 


. 1 
d) =K; / Py uP dt = Kay. hen 
0 
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Finally, (11.23), and (11.27) show that 
; ; 2 PNG 
p? = e fè u P (dr _ e he uP (dr = e Ki i n dr ms pow Z pO” /4x : 


which is the Method 1 formula for going from q® to q®. 

There are certain assumptions that will imply UR. One obvious condition is that the forces 
of failure for each decrement are constant over the year from age x to age x + 1. That is, for 
0 « t « landj=1,2,...,m, p;®(®) is independent of t. 

UR also holds at all ages under the assumption of a uniform distribution of deaths in the 
multiple-decrement table. That is, for an integer (x), 0 < t < 1 andj = 1,2,...,m, 


un ed. (11.28) 
If (11.28) holds, we differentiate (11.13) with respect to s to conclude that for 0 « s « 1, 


q? = pP us). 


Adding this equality for all j 


qP = a? us). 


Dividing the first equation above by the second shows that UR holds at x. 

It should be noted that condition (11.28) is a somewhat unnatural assumption which differs 
from (11.24), and is therefore inconsistent with the usual assumptions of UDD for each single 
decrement. To see this, suppose that we assume (11.28) in a situation with two decrements 
where q' = 0.1, for j = 1,2. Then Method 1 applies, so from (11.18), qÜ = 0.095. Our 
assumption then gives ; pa = 0.0475. Now we can also apply Method 1 to rates over half- 
1/2 


year periods. From (11.17) we have , na = 1 — 0.95 /4 = 0.0487. However, the assumption 


of UDD for each decrement would give , nq = 0.05. 


Special Cases There are decrements that by their nature will not occur uniformly over the 
year, but rather at definite discrete points. A prime example is withdrawal from an insurance 
policy. Nobody is likely to withdraw in a middle of a premium paying period so that these can 
be assumed to occur only on premium due dates. If we assume that all decrements either follow 
such a discrete pattern or are uniformly distributed over each year, there is a fairly simple 
procedure to move from the primed to unprimed rates. This is illustrated by the following two 
examples. 


Example 11.4 In a double-decrement model with decrements of (d) for death and (w) for 
withdrawal, we are given that deaths are uniformly distributed over each year of age in the 
single-decrement table, while one-third of the withdrawals in any year take place in the middle 
of the year and two-thirds occur at the end of the year. Given that fi = 0.20, gh = 0.36 


calculate qP and qo. 
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Solution. To calculate qP we add the probability of withdrawal before death in the middle 
of the year to the probability of withdrawal before death at the end of year. The probability of 
a withdrawal in the middle of the year is (1/3)(0.36) = 0.12. In order that this occurs before 
death, we require that death has not yet occurred by the middle of the year, which by the 
uniform distribution assumption has a probability of 1 — (1/2)(0.20) = 0.9. Arguing similarly 
for the end of the year we have that 


q® = 0.12(0.9) + 0.24(0.8) = 0.300. 


Since 4P = 1 — (0.8)(0.64) = 0.488, we know that q” = 0.188. 

As a check we can calculate the latter figure directly. The probability of death before 
withdrawal in the first half of the year is just the probability of death in the first half of the 
year, since there are no withdrawals during that time. This is equal to 0.10. The probability 
of death before withdrawal in the second half of the year is the probability of death in the 
second half, multiplied by the probability that withdrawal did not take place in the middle 
of the year. This equals (0.1)(1 — 0.12) = 0.088. The sum of the two probabilities is indeed 
equal to 0.188. 

Note that in both calculations above, we used the independence assumption in order to 
multiply relevant probabilities. 


Example 11.5 In a multiple-decrement model with three decrements, failures from causes 
1 and 2 both occur uniformly over each year in the single-decrement table. For cause 3, 60% 
of the failures in any year occur 1/4 of the way through the year and the other 4096 occur 
3/4 of the way through the year. Find formulas that give the unprimed rates in terms of the 
primed. 


Solution. To simply the notation, ns any age x and let a', b', c' denote qi ? for j= 1,2,3, 
respectively, and let a, b, c denote q? Ffori- 1, 2, 3, respectively. Proceed as in Example 11.4 
only now we need that survival from both decrements 1 and 2 occurred at the two points in 
question. This gives 


6 = 0,6c (1 < 7) (1 » ;") 4 0. 4c (1 2 2") (1 = 6") 


, 9 , 9 


uh P0 bie! 
[o ac 207€ + 


sa b'e. 
Formula (11.30) says that 


a+b+c=1-(1 -a(l -b0 -= c). 


Therefore we know by symmetry (since there is nothing to distinguish decrements 1 and 2) 
that 


a=a- a'y! — do + Zd b'c', 
bzb- lya- Wiel + ES ——a!bc. 


2 20 *1 60 
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There is a much longer procedure for the first two decrements,which might be done as a 
check. To illustrate with decrement 1, we use the formula 


1 
a= J a'(1 — tb'),p/dr, 
0 


which follows from (11.16) after substituting from (11.26). Calculation of this will necessitate 


integrating over several intervals, since the formula for pi changes we get 


1/4 3/4 1 
a — a / a - iar f a - i^a - oscar f (1 — tb)(1 — hdt], 
0 1/4 3/4 


We leave it to the reader to verify that this gives the same answer as above. 


11.7.4 Other approaches 


We will describe two additional methods, which were popular in pre-computer days for 
calculating primed quantities from the unprimed. Neither of them can be easily inverted to 
give the unprimed in terms of the primed. They do satisfy (11.14) but their main drawback 
is that they do nor satisfy the consistency relation (11.15) which must hold if we assume that 
the causes operate independently. On the other hand, they give approximate answers quickly 
and can give additional insights into the difference between the primed and unprimed rates. 

Looking at the particular data given above in Section 11.7.1, we see that in order to 
compute Gi we want to estimate how many of those 100 people who became disabled 
during the year will die during the year after becoming disabled. Suppose we postulate that, 
on average, people leave from disability in the middle of the year. Since the chance of leaving 
by death during the year is 0.30, there should be approximately a chance of 0.15 of dying in 
the second half of the year. In other words, approximately 15 of the 100 disabilities would 
die during the year after becoming disabled and we could estimate q' = 0.315, not too far 
from the result obtained above by method 1. 

Our final method gives a more plausible argument for decrements other than death, for 
as we already noted, it is difficult to imagine someone becoming disabled after they have 
died. The argument is as follows. The probability of death during the year should be obtained 
by taking the number of people who died during the year and dividing by the number of 
people who were in the observation period for that year. In the above example we observed 
300 deaths. However, we did not really have a full 1000 people under observation for the 
entire year. Some of the group dropped out due to disability, and were no longer observable 
as potential deaths. Consider extreme cases. If all the people becoming disabled did so at 
the beginning of the year we would have only 900 people left. If all of them dropped out 
at the end of the year, then we would indeed have the full 1000 under observation. We 
can conclude that on average we would have the equivalent of 950 people to observe and we 
can estimate g/“ = 300/950 = 0.316. Similarly, we could estimate q = 100/850 = 0.118. 
Again, we obtain answers close to those of method 1. 
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Both of these alternate methods can be applied to the general case of m decrements. To 
state the formulas it is convenient to define, for each cause j, 


4?- Yd. 
izj 


the probability that (x) will leave the group within the next year from some cause other than 
j. Our first argument above then gives the approximation 


"uH ; l c 
di? =q? (1+ jd 2 (11.29) 


while the second argument leads to the approximation 


l gÀ 
qu m ee (11.30) 
* 1 1 (-/) 
= 5dx 


These general formulas confirm that the two latter methods give different approximations, but 
the answers are usually close, since for a small value of a, 1 + a is close to (1 — ay |. 

The reader may find it instructive to take the case of two decrements and substitute 
algebraically into the right hand side of (11.16) from both (11.29) and (11.30). In both cases 
one obtains a result that is somewhat less than the left hand side of (11.16), verifying the 
inconsistency. However, we see that the difference is likely to be small. It is a sum of terms 
each of which is close to the product of three or more q®s. 


Notes and references 


Promislow (1991) discusses select multiple-decrement tables. 


Exercises 


Type A exercises 


11.1 You are given the following portion of a double-decrement table. (Blanks indicate 
data that you must calculate from the given figures if you need them.) 


x ae at a 
50 - 100 300 
51 700 50 - 
52 470 40 - 
53 320 


Find the probability that (50) will fail first from cause 2 between the ages of 51 
and 53. 
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11.2 You are given the following portion of a double-decrement table. (Blanks are as in 
Exercise 11.1.) 


x oe at de 
50 1200 100 300 
51 - 200 - 
52 300 


(a) A 2-year insurance contract on (50) provides for benefits paid at the end of the 
year of failure if this occurs within 2 years. The benefit payable is 1 unit if failure 
is from cause 1, or 2 units if it is from cause 2. If the interest rate is a constant 
50% per year, find the present value of the benefits. 


(1) 
51 


/(2) 


(b) Find the associated single-decrement rates, q evar 


approximation. 


and q using the Method 1 


11.3 Inatable with three decrements you are given that, for all x € [0, 100), 
ZO = 10(100 — x), £O = 20(100 — x), ZO = 30(100 — x). 
Find the probability that (50) will fail first from cause | between the ages of 60 
and 70. 
11.4 Suppose that in a double-decrement model, you are given the associated single- 
decrement rates 


gu =03,  ® =0.51. 


Compute qP and q? by Method 1 and Method 2. 


11.5 A disability insurance policy provides for payments at the moment of disability should 
this occur within 10 years. The amount of the benefit at time t is e®!%, The policy is 
purchased by level annual premiums payable continuously for 10 years until either 
death or disability occurs. Nothing is paid on this policy if the insured dies before 
becoming disabled. If the force of disability is a constant 0.03, the force of mortality 
is a constant 0.06, and the force of interest is a constant 0.05, find the annual rate of 
premium payment. 


11.6 Given q% = 0.05, q® = 0.08, q? = 0.10, use Method 1 to find q/, j = 1,2, 3. 
11.7 Given qi” = 0.1,q = 0.2, q? = 0.25, use both Method 1 and Method 2 to find 
q), j 2 1,2,3. 
Type B exercises 


11.8 Suppose that in a double-decrement model, qi = Gq. Show that Method 1 and 
Method 2 lead to the same answer for the unprimed rates. 


11.9 


11.10 


11.11 


11.12 


11.13 


11.14 


11.15 


11.16 


11.17 


11.18 
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In a double-decrement table, with decrements d for death and w withdrawal, we have 
that qe is a constant 0.10 for all x and gi? is a constant 0.02 for all x. The rate of 
interest is a constant 0.06. An insurance policy pays 1 at the end of the year of death. 
Nothing is paid to a withdrawing policyholder. Find the present value of the benefits 
on this policy. 

In a multiple-decrement model with two causes of decrement, you are given associated 
single-decrement rates of q = 0.20, gi = 0.36. Calculate the double-decrement 
rates qo and qo in each of the following three cases: (a) by Method 1; (b) by 
Method 2; (c) assuming that failures from cause 1 are uniformly distributed over the 
year, but that the failures from cause 2 all take place at a point three quarters of the 
way through the year. 


In the case of two decrements, compute the difference between the right hand side 
and the left hand side of (11.16) in terms of q” and g, under each of (11.29) and 
(11.30). 


Is the following statement true or false? 


j) j) 
pe +g? =1. 


If false, give a correct version. 


In a model with two decrements, you are given that, for all x, uPA = 0.02 for all t 
and w(t) = (10 — A7! for 0 < t < 10. Find 44. 


In a model with two decrements you are given that q? = 0.1274, qP = 0.1674 and 
that there is a uniform distribution of decrements in each year for both of the single- 
decrement tables. Find gi and qe. 

Consider a double-decrement model, where we relax the hypothesis that the causes 
operate independently. Cause 1 is death, and failure from cause 2 occurs at time f if 
death occurs at time t — 1. Show that 11.23 does not hold. 


In a multiple-decrement model with 4 decrements, qe = i/10,i = 1,2, 3,4. Each 
decrement is uniformly distributed over each year in the single-decrement table. 


Calculate g”, j = 1,2, 3,4. 


For an insurance contract with quarterly premiums, withdrawals will occur at times 
1/4, 1/2, 3/4 and 1. Studies show that at a particular age x, one-third of the withdrawals 
occur at time 1/4, one-third occur at time 1, one sixth occur at time 1/2 and one sixth 
occur at time 3/4. Assume that deaths are distributed uniformly over the year. If 
u — 0.12 and i = 0.20, find q and gi”. (Here, d denotes death and w denotes 


withdrawal.) 


A multiple-decrement model has 3 decrements. For decrement 1, 60 % of the failures 
occur at time 1/3, and 40 % occur at time 2/3. For decrement 2, failures occur either at 
times 1/4 or 3/4 in equal numbers. For decrement 3, failures are uniformly distributed 
over each year. Find formulas giving the unprimed rates in terms of the primed rates. 


12 


Expenses and profits 


12.1 Introduction 


Provision for expenses and profits are two important features that we have ignored in our 
models up to now. Premiums paid on insurance and annuity contracts must not only provide 
benefits, but must also contain an extra amount to cover the expenses of operating the business. 
In addition the premiums must be sufficient to generate profits as a return to the investors who 
provide the capital necessary to start an insurance operation. 

We will begin with a discussion of expenses, and the first step is to distinguish between 
three basic categories of regular periodic expenses. These can depend on the premiums, on 
the amount of the benefits, or be constant per policy. The major expense of the first type is 
commissions paid to the agents selling the policies. It is traditional for their compensation to 
take the form of certain percentages of the premiums. There are various expenses that will 
depend on the amount of the benefits. For example, larger policies will require additional 
efforts and expenses in the selection procedure to verify that the individual is a sound risk. 
Finally, there are expenses, such as setting up records for a new policy, that are largely 
independent of the size of premiums or benefits, and are fixed for each policy. 

A peculiar feature of expenses for life insurance is that a large portion of these are incurred 
in the first year of the policy. Commissions paid on the initial premium are traditionally much 
higher than those paid in subsequent years (known as renewal premiums). The expenses of 
selection and of the setting up of policy records also occur at the beginning. One consequence 
of this is that the amounts to be paid upon withdrawal become an important consideration. The 
potential loss on withdrawal must be taken into account, since if a policyholder withdraws 
early, there may be no chance for the high initial expenses to be recovered through the 
premiums. 

The expenses mentioned above will be paid at fixed times. But there are also extra expenses 
incurred when a loss occurs and a claim is reported to the insurer. The insurer must investigate 
to ensure that the claim is legitimate and then arrange to disburse the benefits. Some of these 
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expenses would depend on the policy amount, as more care would be taken with a large policy. 
Others would be fixed. These are often referred to as settlement expenses. 

For a typical life insurance or annuity policy, the mathematical treatment of expenses is 
quite straightforward. Suppose the insurer determines that an expense e, will be incurred at 
each time k = 0, 1, .... Then the insurer is simply providing an additional life annuity with 
benefit vector e = (e, ej, ...). Similarly the expenses of claim settlement would be handled 
by an increase to the death benefit. Of course the policyholder does not receive these amounts. 
They are paid to those providing the goods or services involved in the expense, but they still 
must be provided for from the premiums. 

Premiums which make provisions for expenses as well as benefits will be referred to as 
expense-augmented premiums. Some authors use the term gross premiums. We will however 
reserve this term for the premiums actually charged, which as pointed out in Section 4.6.1, 
can take into account factors other than the benefits and expenses. such as profits, which we 
discuss later on in this chapter. 


Example 12.1 A whole-life policy issued at age x, with level annual premiums paid for 
life, provides for a death benefit paid at the moment of death. The expenses are as follows: in 
the first year, 70% of the initial premium, 1% of the face amount, and 30 per policy; in years 
2 to 10, 10% of each premium, 0.5% of the face amount, and 10 per policy; after 10 years, 
596 of each premium, 0.296 of the face amount and 5 per policy; the settlement expense is 
100 per policy plus 0.5% of the face amount. Assume that the expenses in any year are paid 
at the beginning of the year. 


Find a formula to compute the annual expense-augmented premium for a policy with a 
constant death benefit of 200 000. 


Solution. Let G be the expense-augmented premium. We simply proceed as we did in 
Chapter 5, equating the present value of premiums with the present value of the death benefits 
plus the present value of the additional expense benefits. Of course the latter depend partly 
on G, which just means we have to solve a simple equation at the end. 

It is best to consider two expense vectors. These are 


e = (2030, 10109, 405%), 
which covers the expenses depending on face amount as well as the fixed expenses, and 
r = (0.07, 0.19, 0.05%), 


which gives the proportion of the premium for premium-based expenses. The settlement 
charges mean that the amount paid on death is effectively 201 100. We then have the equation 


Gà, = 201 100A, + ä,(e) + Gà,(r), 
from which we get 


201 100A, + d,(e) 
Ge aa, (12.1) 
à,(0.3, 0.99, 0.95.) 


Note that the vector in the denominator is just (1,, — r). 
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A more realistic model would also involve withdrawal. We would start with a double- 
decrement table with decrements d for death and w for withdrawal. We would also need a cash 
value vector cv, where cv; would equal the cash value given to a policyholders who withdraws 
at time k + 1. The subscript is chosen to correspond to the death benefit vector. (Note that we 
can safely assume that for a policy with premiums paid annually, withdrawal will only occur 
at an integer time, when a new premium is due.) We need as well a vector e”(standing for 
the expenses of withdrawal) where e would be the expenses incurred for withdrawal at time 
k, analogous to the settlement expenses added to the death benefit. Formula (12.1) would be 
modified to 


ae d " e(t 
Q2 100A + AU? (ev + e") + (P (e) 


a? -r) 


12.2 Effect on reserves 


Suppose that we compute reserves by adding expenses to the benefits and using expense- 
augmented premiums in place of net. The resulting quantities are known as expense-augmented 
reserves, while the reserves calculated in Chapter 6 are known as net premium reserves. Is 
it natural to ask as to how these two quantities compare. In the usual level premium policy, 
the expenses constitute a decreasing sequence of benefit payments that are paid for by a level 
addition to premiums. The extra expense charge included in the premium is less than needed 
to cover the expenses in early years, and more in later years. This means that typically the 
expense-augmented reserves will be less than the net premium reserves. To put it another 
way, the high initial expenses are a receivable to the insurer that will be collected from future 
premiums, and this causes a reduction in liabilities. Insurance regulators have usually taken 
the viewpoint that one should not count on these receivables since the policyholder might 
withdraw and never pay them. It is traditional that reserves be calculated without taking into 
account expenses, and using net in place of expense-augmented premiums. So even if realistic 
mortality and interest assumptions were used in calculating reserves, ignoring expenses causes 
higher reserves, and will cause losses to be shown in the early years of a policy. 

Consider a simple example. Suppose that for a certain policy issued to (x) with benefits 
paid at the end of the year of death, by = 1000, q, = 0.2, i = 0.10, the net premium payable 
at time 0 is 250, the expense-augmented premium is 300, and the total expenses payable at 
time 0 are 100. The expense-augmented premium reserve at t time | would be 


1.10 0.2 
— 100)—— — 1000— = 25. 
ore 0.8 0.8 


while the net premium reserve at time | is 


1.10 0.2 
250 ——- — 1000— = 93.75. 
50 08 00078 93.75 


Even if the mortality and interest matched exactly those of the reserve assumptions, the use 
of net premium reserves will result in a loss of 68.75 per policy in the first year. This arises 
from the fact that the provision for expenses is made by an extra charge of only 50 to the 
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level premium, but this does not cover the actual expense of 100. (Of course, as we showed 
in Chapter 6, there will be corresponding gains in future years, when the actual expenses are 
less than the 50 allowed for in the premium.) 

To alleviate this distortion of the profitability in the first year, many regulatory bodies 
permit reserves to be calculated by methods that are generally known in North America as 
modified reserve systems. (Similar procedures are popular in some European countries and 
are known as Zilmerized reserves.) Recall the term valuation premium to refer to premiums 
used in calculating reserves, which we introduced in Chapter 6. We stress again that these 
are not necessarily (and in fact almost never) the same as the insurer actually charges. The 
modified systems still call for expenses to be ignored, but they allow for lower initial valuation 
premiums, followed by higher valuation premiums for the later durations, calculated so that 
valuation premiums remain actuarially equivalent to benefits. This effectively recognizes that 
the insurer has a smaller premium in the first few years to provide the benefits since a large 
amount of these premiums must go to pay the high initial expenses. 

There are many such systems. The most basic is the full preliminary term method. Suppose 
we have a life insurance policy with level premiums payable for n years, and benefits payable 
at the end of the year of death. The initial valuation premium is taken as bov(1)q,, the exact 
amount needed to provide the benefits for the first year, followed by a level valuation premium 
for all subsequent years. This will automatically make į V = 0. This level valuation premium 
for years after the first will be 


Ati (b o 1) 


pads 


which will provide for the benefits after the first year. This effectively means that the entire 
first-year valuation premium, less the amount needed for the benefits according to the reserve 
basis, is available to use for expenses. 

There are several variations to full preliminary term. In some cases there is a recognition 
that for policies with high initial premiums, such as endowment insurance, or premiums 
payable over a limited period, this provides more expense allowance than is needed in the first 
year, and the first-year valuation premium is taken to be higher than bovq,. In other cases there 
may be more than one step. There will be a low premium used in the first year, a somewhat 
higher one for some duration, for example the next 10 years, and a still higher one after that. 
We will not discuss the details of the various methods here. Some of these are dealt with in 
the exercises. 


12.3 Realistic reserve and balance calculations 


As we have indicated, insurers are required to calculate reserves in accordance with the 
particular regulations that apply in their jurisdiction. These methods are intended mainly 
to ensure that policyholder will receive their promised benefits. These are often referred to 
as statutory reserves. There are however many reasons for the insurer, to calculate reserves 
and balances on a more realistic basis which use expense-augmented or gross premiums and 
expenses, rather than net premiums with expenses ignored, and which takes into account 
withdrawals. An acronym that is sometimes used here is that of a ‘GAAP’ reserve, the 
acronym standing for generally accepted accounting principles. Such a reserve would then 
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be calculated as the present value of all future benefits, including death benefits, expenses, 
surrender benefits, less the present value of future premiums. It is useful to replace our basic 
recursion formula (6.8) with the following more general version. For simplicity we assume 
aggregate mortality, and benefits payable at the end of the year of death. Consider a policy 
with annual premiums of z;, death benefits of b}, cash values of cv, an expense of r; per unit 
of premium paid at time k, fixed expenses at time k of e,, claim settlement expenses of ed in 
the case of death between time k and time k + 1, and surrender expenses of er for surrender at 
time k + 1. Our basic recursion (6.8), when modified for expenses and withdrawal, becomes 


; d 

k+ V = V + nz = rp) = e,](1 + iy) = qo. (b, + el k+l V) = a, (cvy + e —k+1 V) E 
(12.2) 

and solving for z}; V, we obtain the Fackler type equation 

V l-r)- 1+i,)—-q® (p d\ _ (w) w 
LV +m- rj) — e + ig) qi ( kt ef) dok (cv; te ) 

kV = — (d) (w) de» 

1- dk = dk 


We want to discuss three main quantities which arise from the above recursion. They 
differ in the choice of premiums and initial values. 


Expense-augmented reserves. These reserves, as defined above, can be calculated from 
(12.3) with mz, equal to the expense-augmented premium. 


Gross-premium reserves. These are simply the present value of future benefits and 
expenses less the present value of future gross premiums. They can be calculated from (12.3) 
with z, equal to the gross premium. 


Asset Shares. These are realistic balance calculations. They give the accumulated value of 
the premiums less the accumulated value of the benefits and expenses, They can be calculated 
from (12.3) using gross premiums, but now with an initial value of O at time 0. (Different 
notation is often used here to distinguish them from reserves. We will write AS, in place of 
kV.) The name comes from the fact that they represent that share of the insurer's assets which 
are attributable to the particular contract. 


We will use the following simplified example to illustrate the difference between these 
quantities. 


Example 12.2 A two-year policy on (x) provides for death benefits of 1000 paid at the end 
of the year of death. Premiums of 160 are paid for 2 years. There are expenses, incurred at the 
beginning of each year, of 2096 of the premium plus 18 in year 1, and 5 in year 2. You are given 
that i= 10096, q, = 0.2, q,,, = 0.3. There are no withdrawals. Find the expense-augmented 
reserves, gross-premium reserves, and asset shares. 


Solution. Let G denote the expense-augmented premium. The present value of benefits and 
expenses is 


1000[(0.5)(0.2) + (0.25)(0.8)(0.3)] + 18 + 0.2G + 5(0.5)(0.8)] = 180 + 0.2G. 


Then G[1 + (.5)(.8)] = 180 + 0.2G showing that G = 150. 
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For the expense-augmented reserves, 


oV = 0, 
1V = [0.8(150) — 18) x 2 — 200]/0.8 = 5, 
2V = 0. 


For gross-premiums reserves, 


oV = 180 + 0.2(160) — 160(1.4) = —12, 
1V = [C712 + 0.8(160) — 18) x 2 — 200]/0.8 = —5, 
2V = 0 


For the asset shares 


ASo = 0, 
AS, = [(128 — 18) x 2 — 200]/0.8 = 25, 
;AS, = [25 + 160 — 5] x 2 — 300]/0.7 = 85.71. 


In the following section we will revisit this example, and interpret the figures in terms of 
profit calculations. 


Remark For simplicity, some examples involving expenses will ignore withdrawals. This 
does necessarily mean that withdrawals are not allowed as we stated in the previous example. 
The implicit assumption is that at each duration, the cash value together with any expense of 
withdrawal is precisely equal to the reserve on the policy. In such a case the withdrawal rates 
and cash values have no effect on premiums or reserves, and need not be taken into account. 


12.4 Profit measurement 


12.4.4 Advanced gain and loss analysis 


We want to continue the profit analysis we began in Chapter 6, but now incorporating expenses 
and withdrawals, and also by considering a more extensive context. 

We start with a very general setting. We are given a particular insurance or annuity 
policy, with a description of the benefits and cash values, and a set of reserves. The latter 
could be net premium reserves, expense-augmented reserves, gross-premium reserves, or even 
something else. In addition we are given a basis for all relevant factors, known as the profit test 
basis. 'These are interest rates, rates of mortality and withdrawal, expenses and the premiums 
which will be charged. We want to determine the profit we will have in a specified period, 
if our actual experience is according to the given profit test basis and we use the specified 
reserves to measure profit. There are different contexts possible. The profit test basis could 
be a hypothetical test basis, to see what profits will emerge under our assumptions, and the 
major goal is to see if the proposed premiums will achieve a desired level of such profits. 
Alternatively this calculation could be carried out after the fact with the profit basis reflecting 
the actual observed experience, and the result will be the actual profit that is achieved. 
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In any event, for the period running from time k tok + 1 we have a profit of Pr, , ; consisting 
of the right hand side of (12.2) less the reserve ,,, V. That is 


Pr, 4 = GV + zl ini ry) = e,X1 + iy) A ah (b, + el md iV) 


do (eme aV) ia V. (12.42) 
= (V zy — n) — e) H i) — a (by + ed) 
do (CM + e) — Po ka V- (12.4b) 


The two versions represent two possible points of view, leading to the same result, as we 
already indicated in Chapter 6. In (12.4a) the insurer sets up the reserve for everyone and 
then needs to pay only the amount of the benefits above the reserve in the case of death or 
surrender, while in (12.4b), the insurer pays the benefits in full but then needs to set up a 
reserve only for the survivors. Version (12.4b) is normally the most efficient for calculation. 
Yet another version is to write 


Pra = Gn = n) — e9Q + i) — qe (br + ef) 


= ge (cv, +e) - AQ), (12.4) 
where 
AV) = pO, Lu, Vai (12.5) 


represents the increase in reserves over the year. This version simply says that profit is what 
is left over after we accumulate the net amounts collected, and provide for benefit payments, 
expenses and the increase in reserves. 


Example 12.3 Refer back to Example 12.2 and calculate Pr, when the the underlying 
reserves are: (a) gross-premium reserves; (b) expense-augmented reserves. In each case, the 
profit test basis is the same as the reserve basis. 


Solution. (a) It is clear that Pr; = Pr; = 0. So what happened to the profits? The answer 
is that with gross-premium reserves, and observed experience the same as the reserve basis, 
they all occur at time 0. The reserve of —12 at this time, means that the insurer could take 
12 out of the company surplus for each such policy and still have enough left to provide for 
the reserves. This shows that in addition to the definition of Pr; given above we must add the 
statement that 


Pro =p V. 
(b) Using (12.4(b)), 


Pr, = (128 — 18)2 — 0.2(1000) — 0.8(5) = 16. 
Pr, = (5 + 160 — 5)2 — 0.3(1000) = 20. 


Note the difference in the incidence of profit for the different types of reserves. With gross- 
premium reserves, all the profit is recognized at the beginning. With the expense-augmented 
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reserves, the profit emerges gradually as the extra amounts above the expense-augmented 
premiums come in. In this case there is an extra 10 received in the first premium, but this 
is reduced to 8 after the percentage of premium expense. This accumulates in | year to 16. 
Similarly the extra 10 in the second premium accumulates to 20 at the end of the second year. 
It will instructive for the reader to verify that in the long run, that the reserve basis will not 
affect the actual profits received. 

This example also indicates that we can view asset shares as the sum of the expense- 
augmented reserves plus an accumulation of the profits at interest and survivorship. For 
example, 


AS, = 0+8(—) + 10(5) = 85.71. 


as calculated above. 


12.4.2 Gains by source 


We now elaborate on the decomposition of gains by source, as initiated in Section 6.4.1. We 
will in fact consider a somewhat more general situation. Suppose we have two different bases 
under consideration, one in which quantities are starred and the other unstarred. In the most 
common type of application the unstarred symbols refer to the expected basis, and the starred 
will refer to what actually happened, but mathematically these could be any two bases. We 
then have two profit calculations, Pr? and Pr, and we are interested in the difference Pr; — Pry 
which in the common situation will be the excess of actual profit over the expected profit. 
In the case where the reserve basis coincides exactly with the unstarred basis, we have that 
Pr, = 0 for all k and the difference will just be Prý. This was precisely the point of view we 
took in Chapter 6. The goal is to decompose this difference into the various gains by source. 
As well as the mortality and interest gains of the Chapter 6 model, we now have additional 
sources. 

One is the gain from withdrawal, which is calculated analogously to the mortality gain as 


(a, = 2 (cv 7i V). 


In addition we have gains from the various types of expenses. These present a compli- 
cation, since they are invariably tied up with other factors. Consider for example the case of 


death settlement expenses. As a particular example, suppose we have T = 0.07, ¿2 = 


0.05, el = 200 and ee = 100. So our mortality experience is more favourable than expected 
and the costs of settling each claim are less than expected. We gain in both ways and in fact 
the total gain per policy will be 0.07(200) — 0.05(100) = 9. We would like to divide this gain 
up into that attributable to mortality, and that attributable to lower settlement expenses. A 
somewhat surprising fact is that there seems to be no reasonable way to do this uniquely. To 
illustrate, consider the general situation, which applies as well to withdrawal expenses. (For 
simplicity in notation we will omit superscripts and subscripts.) 
We have a total gain of (ge — q*e*). Now we can write this in two distinct ways, namely 


(q — q*)e + (e — e*)q'. or (q — q')e* + (e — e*)q. 
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In either case it would be reasonable to take the first term to be the mortality gain and the 
second term to be the expense gain. However, there is nothing to choose between them, 
and they give different answers. One way to decide is to determine an order in which we 
will calculate the gains. If we decide to look at mortality first, it would be natural to ignore 
the expense differential at this point and take (q — q*)e as the mortality gain, as in the first 
expression. Then necessarily we would take (e — e*)q* as the expense gain. If we decide to 
look at expense first, we would decompose according to the second expression. In short, for 
the factor considered second, the gain will be expressed with the other factor starred. So in 
our numerical example above, we could divide the total gain of 9 into 4 for mortality and 5 for 
expenses, or 2 for mortality and 7 for expenses, deciding on which decomposition we used. 

Another way of viewing this is to note that we can also write the total gain in a symmetric 
fashion as 


qe — q'e* = (q — q*)e* q(e — e*) - (q — q“ Xe — e^). 


The first term on the right is strictly mortality gain, the second term is strictly expense 
gain and the third term, which is a combined effect, can be allocated to one or the other. 

This seems rather arbitrary and artificial but it is the best we can do if we insist on a 
decomposition. 

Indeed we have already encountered this problem before, namely in the example involving 
premium difference gain in Chapter 6, a gain which is tied in with interest. Since we had 
previously talked about interest gain, it seemed natural to do that first, but we could have 
instead taken the premium difference gain as (z* — z)(1 + i), leaving the total interest gain as 
GV + z*)(i* — i). 

The same problem comes with periodic expense gains which also involve interest. If we for 
example, focus on interest first, the total gain of (e,(1 + i) — e; + i*k) will be decomposed 
into a part due to interest of e, (i, — ir) and a part due to expenses of (e, — end + i*). (Note 
that earning more interest than expected will result in the first item being negative. The reader 
should ensure that they can explain why this is so.) 

What would happen if there were a source of gain connected with three different factors? 
We'll leave it to the reader to verify that there would now be six possible ways of dividing 
up the gain. Theoretically this could happen. For example, a percentage of premium expense 
would be affected by differences in both premiums and interest. In the common situation we 
have described, this will not occur since the premiums payable on both the actual profit basis 
and the expected profit basis will be the same, namely the gross premium, and the premium 
difference gain will be zero. 


Example 12.4 You are given the following information about a policy issued at age 50. 
The gross premium payable at age 65 is 2000, and a death benefit of 100 000 is payable at 
age 66 for death occurring between age 65 and 56. A cash value of 20 000 is paid at age 66 
for policyholders surrendering at that time. Moreover ;;V = 20 300, and 45V = 22 500. 

The insurer's best estimate for mortality, withdrawal and interest is given by 


(d) _ wW js 
q? 20005, 4%? = 0.008, i= 0.04. 


Anticipated expenses at the beginning of the year are 5% of the gross premium, plus 100. In 
addition, the expected settlement expenses are 200 for a death claim and 100 for withdrawal. 
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Now here is the actual experience. Out of 1000 policies at the beginning of the 16th policy 
year, there were 4 deaths and 10 withdrawals. The actual interest rate earned was 0.048. The 
actual expenses incurred were 230 per policy at the beginning of the year, including both 
percentage of premium and fixed expenses, 150 to settle each death claim, and 120 for each 
withdrawal. 

For the year running from time 15 to time 16, calculate the actual profit less the expected 
profit, and classify this difference by source. For the latter, assume that all expense gains are 
calculated after the gains from the other sources. 


Solution. Taking the actual results for the starred quantities, and using (12.3) 


Pri; = (20 300 + 2000 — 230)(1.048) — .004(10 0150) — 0.010(20 120) — 0.986(22 500) 
= 342.56 
Pr;g = (20 300 + 2000(1 — 0.05) — 200)(1.04) = 0.005(10 0200) — 0.008(20 100) 
—0.987(22 500) 
= 114.7 


Actual profit — Expected profit = 227.86. 


This can be decomposed as follows: 


Gain from mortality (including the portion due to settlement expenses) = (0.005 — 
0.004)(10 0200 — 22 500) = 77.70. 


Gain from withdrawal (including the portion due to withdrawal expenses) = (0.008 — 
0.010)(20 100 — 22 500) = 4.80. 

Gain from interest = (20 300 + 2000 — 200)(0.048 — 0.040) = 176.80 

Gain from percentage of premium and periodic expenses = (200 — 230)(1.048) = — 31.44 

Death settlement expense gain = 0.004(200 — 150) = 0.20 

Withdrawal expense gain = 0.010(100 — 120) = —0.020. 


The expense gains above, coming after the other calculations, all use the actual experience 
for the other factors, as we noted in our remarks about order above. The reader should find it 
instructive to redo the calculations, now assuming that the expense gains are calculated first. 

Note that since the insurer benefits from a withdrawal, paying out less than the reserve, 
the higher than expected number of withdrawals caused a positive gain. 


12.4.3 Profit testing 


We now return to the case where we have a proposed profit test basis, and proposed gross 
premiums. We want to compute the profits over several periods, and see how we can use the 
resulting figures. 


Example 12.5 A five-year, 100 000 endowment insurance policy on (60) has benefits 
payable at the end of the year of death and carries annual premiums of 21 500, payable for 
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5 years. Initial expenses are 60% of the premium plus 200, and renewal expenses are 5% of 
the premium plus 100. Death settlement expenses for the first 4 years are 0.2% of the the face 
amount plus 50. The expense of paying any cash value, or the final payment at time 5 is 50. 
The profit test basis interest rate is 5%. The profit test basis mortality and withdrawal rates, 


reserves, and cash values are shown in the following table. Find Pr, for k = 0,1, ...,5. 
d 

k TE E KV CVk-] Pr, 

0 0 0 0 
1 0.009 0.080 18 500 0 —8935.75 
2 0.010 0.050 37 500 17 500 3636.25 
3 0.012 0.030 57 500 42 500 3151.75 
4 0.014 0.020 78 500 70 000 3080.75 
5 — — 100 000 0 3716.25 


Solution. The values of of Pr, are computed directly from (12.3) and are shown in the final 
column. As an example, here is the computation of Pr». 


Pr; = (18 500 + 21 500 — 0.05(21 500) — 100)(1.05) — 0.01(100 250) — 0.05(17 550) 
— 0.94(37 500) 
= 3636.25 


The vector (Pro, Pr, ...) is sometimes called the profit vector. It is not however the vector 
we want in order to accomplish our goal of analyzing profits. Recall that Pr, is the profit in 
the year running from time k — 1 to time k for a policy in force at time k — 1. The figure we 
want is the expected profit for that year. Accordingly we make the following definition. 


Definition 12.1 The profit signature for a policy issued at age x is the vector II, where 
II, = Pro, and for k > 0 


IH, =- p? Pry. 


In other words we multiply the yearly profits by the probability that the policy holder will 
survive to the beginning of the particular year. 


Example 12.6 Find the profit signature for Example 12.5. 
Solution. We first calculate 


(0 = 0.82037, 4p” = 0.79248. 


x x 


(0 = 0.85634, 3p 


x 


pO =0.911, p 
A straightforward calculation then yields 


II = (0, —8935.75, 3312.62, 2698.97, 2527.37, 2945.06). 
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Our problem of analyzing profits is now reduced to the familiar problem, dealt with from 
the beginning of this book, of analyzing a sequence of cash flows. 

For example we can summarize this cash flow sequence with a single number, by comput- 
ing the present value of the profit signature according to some discount function. This gives 
the net present value of profits, often abbreviated as NPV. In practice the discounting is at 
a constant rate of interest which is called the risk discount rate. We can interpret this rate 
as one which is the desired return by an investor putting up the capital to fund an insurance 
enterprise. It will therefore reflect the risk involved in such an investment and will accordingly 
be normally higher than say the profit test rate. Note that we do not discount by survivorship in 
calculating the NPV. This has already been considered when calculating the profit signature. 

Recall now that we have calculated this NPV according to a single policy, with a specified 
amount of benefits and premiums. To interpret it properly and obtain measures which can be 
used for comparative purposes, we must relate this to some measure of volume. A natural 
candidate for this is the present value of premiums. 


Definition 12.2 We define the profit margin for the policy as the quantity 


NPV 
(7) 
ay (x) 


where z is the premium vector for the policy. The discounting in the denominator is done 
with both interest, at the same risk discount rate used in the numerator, and with survivorship 
as given by the multiple-decrement table. 


Example 12.7 If the risk discount rate is 0.07, find the profit margin for the policy of 
Example 12.5. 


Solution. The denominator is just 21 500 ágg(15) at 7% interest = 83 282.65. 
The numerator is 


(1.0771)(—8935.75) + (1.07)? (3312.62) + (1.07)? (2698.97) + (1.07)-*(2527.36) 
+ (1.07)7?2984.09 = 773.27 


The profit margin is 773.27/ 83292.65 — 0.0093. 


This example shows then that for the policy in question, for every dollar in premium 
received in sales, 0.93 cents would represent profit. This can be used for various decision 
problems. If it deemed lower than the company would like, then some changes must be made 
in the policy structure, usually by raising the premium charged, or in some case by lowering 
the surrender benefits if allowed by applicable legislation. The profit margin can also be 
compared with other policies to see which types of benefit and premium structures are most 
profitable. 

A natural question now arises. How is the profit margin affected by the choice of reserves. 
Our calculations in Section 6.5 suggest that under certain conditions the reserve basis should 
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not affect the profit margin, and this indeed is true when the discount rates used in computing 
the NPV are the same as those of the profit test basis. 

To see this, look at formulas (12.3) and (12.5). Suppose that the single reserve figure V 
is increased by an amount E. Then A, ,V is increased by p, E so that 


Pr; is decreased by p, E. 
Similarly, A,(V) is decreased by E(1 + i) so that 
Pr,,, is increased by (1 + i,)E 


resulting in an decrease in II, of ,p,E and and increase in Hg}; of ,p,(1 + i,)E, which will 
cancel out when we discount, leaving the NPV unchanged. 

This is not true however in the more typical case where the risk discounting is at higher 
rates. Suppose that E > 0. The calculation above shows that discounting the increase in II, 
with a rate higher than i, for the year running from time k to k+ 1 will decrease the NPV. 
This is easily explained. The higher reserve means that more money must be set aside at time 
k, reducing profits at that time. This money is earning interest and will add to the profits the 
following year. However under the present scenario, it is not earning sufficient interest to 
compensate. The conclusion is that in the usual case where risk discount rates are higher than 
the profit test rates, increases in reserves cause reduced profitability, and decreases in reserves 
cause increased profitability. 


Notes and references 


See Bowers et al. (1997, Chapter 16) for a detailed discussion of various modified reserve 
systems. 

Net premium reserves are sometimes referred to as benefit reserves in the literature since, 
they involve only benefits and not expenses. The difference between the expense-augmented 
reserve and the net premium reserve is sometimes referred to as the expense reserve. It 
represents the present value of the future premiums available for expenses, less the present 
value of the future expenses. 

Some authors treat certain expenses incurred right at or just before a policy begins differ- 
ently than above. They would appear as a negative contribution to Pro. In our treatment they 
are accumulated to the end of the year, and appear as a negative contribution to Pr;, in the 
same way as the initial premium received at time 0, makes a positive contribution to Pry. 


Exercises 


Type A exercises 


12.1 Aninsurance policy issued at age x provides for a payment of 100 000 at the moment 
of death provided this occurs within 20 years. Level annual premiums of G are payable 
for 10 years. The expenses are as follows: in the first year, 6096 of the initial premium, 
1% of the face amount and 20 per policy; after the first year, 5% of the each premium, 
0.25% of the face amount and 10 per policy; the death benefit settlement expense is 


12.2 


12.3 


12.4 
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50 per policy plus 0.596 of the face amount. Assume that the expenses for any year 
are paid at the beginning of the year. Find a formula for G. 


An insurance policy issued at age x has death benefit vector b paid at the end of the year 
of death, and level annual premiums payable for n years. Assume a constant interest 
rate. Given A,(b) = 330, A,,4,(bo1) 2280, à,(1,)— 1l,  à,44(1, 4) = 8 and 
ü, 4 (1, 4) = 6, find the excess of the reserve at time k calculated with net premiums 
over the reserve at time k calculated by the full preliminary term method. 


A certain policy on (x) has gross-premium reserve of 1000 at time 10. The gross 
premium payable at time 10 is 100. The death benefit payable at the end of year of 
death for death between time 10 and time 11 is 1500. The cash value payable at time 
11 is 900. Expenses for the year running from time 10 to time 11 are 20, paid at the 
beginning of the year. The interest rate i = 10%, do = 0.08 and ao = 0.05. Find 
the gross premium reserve at time 11. 


A policy on (x) provides death benefits of 1000 paid at the end of the year of death. 
There is no cash value for withdrawal in the first year. For withdrawal in the second 
year, an amount of 30 is paid at the end of the year. You are given that 


q2 =0.1,  q® 202,  q?-005 q% =0.06, i= 0.07. 
Expenses are 120 in the first year and 40 thereafter, paid at the beginning of the year. 
The level expense-augmented premium payable annually is 250 and the corresponding 
gross premium is 300. The gross-premium reserve at time 0 is —130. Find (i) the 
expense-augmented reserve, (ii) the gross-premium reserve, (iii) the asset share, all at 
the end of 2 years. 


Type B exercises 


12.5 


12.6 


12.7 


A modified reserve system has a first-year premium of bgvq, a second-year premium 
of b, vq,,1, followed by a level renewal premium of f. Assume constant interest and 
benefits at the end of the year of death. Find an expression for f, when premiums are 
payable for n years. 


Consider a policy issued at age x with benefits payable at the end of the year of death. 
A modified reserve system, used in Canada for tax purposes (sometimes known as 
the 1; preliminary term), takes an initial premium to bovq,, a premium at time 1 of 
P, and a level premium of y payable after time 1, such that f is the average of bivqui 
and y. Assume constant interest and benefits paid at the end of the year of death. If 
premiums are payable for life, show that 


— Abo D) + Piu a0 02) 
yu] + VP duo l 
Suppose that in Example 12.3 the insurer sells 100 of the given policies. Show the the 


total profits accumulated with interest to time 2 are the same, regardless of whether 
gross premium or expense-augmented reserves are used. 
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12.8 


12.9 


12.10 
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Redo Example 12.4 with the expense gains calculated before those from mortality, 
withdrawal and interest. 


Refer to Example 12.5 


(a) Suppose that ;V is changed to 13 000 while all other data remains the same. 
Calculate the new profit signature and profit margin. 


(b) Redo part (a) with the risk discount rate equal to 5%. 


(c) Find the profit margin, for the original problem with a risk discount rate equal 
to 5%. 


Refer to Example 12.5. Suppose we decide to increase the premium to achieve a profit 
margin of 3%. What should the new premium be? 


*13 


Specialized topics 


For the concluding chapter in Part I of this book, we cover a few specialized topics in insurance 
and annuities that are of current importance. We will not go into any of these topics in depth, 
but rather provide an introduction to some of the major features. The material here is not 
needed for any other chapters of the book. 


13.1 Universal life 


13.1.1 Description of the contract 


Universal life is a type of contract that began in the 1970s, and now accounts for a substantial 
portion of life insurance sales. Prior to that time, the mainstay of life insurance was for the 
most part whole life or endowment contracts. The origin of the change could well have been 
the advice given by some financial advisors that people should buy term insurance instead 
of those products, for a much smaller premium, and then invest the difference elsewhere. 
Now our analysis in Section 6.4.2 shows that this is exactly what happens within the policy 
itself when one purchases whole life or endowment insurance. The excess premiums, not 
needed to provide insurance in a particular year, are invested in the savings portion. Of 
course, the purchaser of endowment or whole life insurance has less flexibility than the term 
buyer, both with regards to the relative amounts going into the insurance and savings portion, 
and to the types of investments and rates of return. Universal life plans were developed to 
allow this flexibility within a single contract. The main features involve variable premium 
payments, and the ability for the policyholder to participate in higher yielding investment 
opportunities. 

In the usual type of universal life plan, premiums are not fixed in advanced and may 
be varied at the option of the purchaser. Each individual has in effect a separate fund that 
changes over time in the way we describe it by the recursion formula (6.8). Premiums paid 
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are deposited, the fund is credited with interest or investment earnings, while expenses, and 
the cost of insurance, based on the net amount at risk are deducted. The policyholder need 
only pay sufficient premiums to ensure that the fund value will cover the cost of insurance. 

In this type of plan where there is a more direct relationship between policyholders and 
their individual accounts, there is a more frequent demand for the type of plan that we 
discussed in Section 6.7. It is usual to allow two options. The purchaser can fix the amount 
paid at death, or they can fix the net amount at risk, so at death, the amount in their account 
(representing the reserve) would be returned in addition to the stipulated face amount, giving 
an increasing death benefit. In the usual terminology the fixed death benefit is often referred 
to as a Type A policy, whereas the death benefit plus account value is referred to as type B. 

Some plans provide the flexibility to alter the face amount as well as the premium, with 
the stipulation that a request to increase the coverage would normally necessitate the type of 
information as required by new policyholders as to health and other conditions affecting the 
risk. There do arise cases however when such changes are mandated. In the United States 
this occurs due to regulations stipulating that in order to receive the favourable tax treatment 
accorded to life insurance, the death benefit must exceed the fund value by a certain minimum 
percentage. If the fund value gets high enough, the insurer will increase the death benefit in 
order to maintain this so-called corridor requirement. 

Policyholders can lapse the policy at any time and receive their fund value less a so-called 
surrender charge that is intended mainly to account for the initial expenses, as we outlined 
in Section 6.6. The surrender charge will decrease with time and normally will become zero 
after a certain number of years, often 20 or so. 

As well as the flexibility in premium payments a major attraction of universal life for the 
purchaser is the opportunity to earn a higher yield. The credited interest rate is usually not 
fixed in advance as it essentially is in traditional plans, but it is allowed to vary. There are 
different ways of accomplishing this. In some cases the funds of the policyholder are actually 
invested in certain assets, so that the account is very much like a mutual fund investment 
together with guaranteed benefits payable on death. Usually there is a choice of various types 
of assets or funds to invest in, so the policyholder can control the amount of risk they are 
willing to take in order to obtain higher returns. This type of contract is often referred to as 
variable universal life. In other cases the account of the policyholder need not be invested in 
any particular assets but interest is credited according to earnings on some reference portfolio. 
Such contracts are known as equity indexed insurance. There are normally some qualifications 
to the crediting of interest. There will be a stipulated minimum rate of interest that is credited 
regardless of what the reference portfolio does. It is also common to have limits on the other 
end. There may be a ‘participation rate’. So for example if the reference portfolio earned 
9% interest over a period and the participation rate was 80%, the the policyholder’s account 
would be credited with 7.2%. In addition there is often a stated maximum that will be credited 
regardless of the actual earnings. 

Universal life contracts often carry other guarantees that mitigate again unfavourable 
investment experience. These are referred to as secondary guarantees. A popular option of 
this type, is that as long as the insured keeps up a certain stipulated minimum rate of premium 
payment, the death benefit is guaranteed even though lower than expected rates of return 
bring the person’s account below the amount needed to meet the costs of insurance. A similar 
provision is commonly used with endowment type contracts that have a maturity date, in 
which case a minimum guaranteed amount paid at maturity provided that the insured keeps 
up a minimum schedule of premium payments. 
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13.1.2 Calculating account values 


For each policyholder, the amount of money in their account is calculated periodically, often 
monthly. For the most part this is handled easily by the basic recursion formulas that we 
have seen in previous chapters. There are however some aspects of terminology and special 
features that will be discussed. Given the account value AV, at time k, the account value 
AV, is calculated using the following familiar quantities. We assume aggregate mortality 
and payment of benefits at the end of the period of death. 


(a) z, the premium paid at time k. 


(b) rz, the premium expense rate. This is a percentage, often around 5%, taken from each 
premium by the insurer to handle expenses. Only the remaining amount (sometimes 
referred to as the allocated premium) is credited to the account. In the so-called 
unit linked policies popular in the United Kingdom, it is common to incorporate 
this expense by a device termed a bid—offer spread. In these cases the policyholder 
is deemed to have their own separate fund, distinct from the general assets of the 
company, and the premium is used to buy a number of units of the fund. These are 
bought at a certain offer price, and sold back at a lower bid price. So for example if 
the offer price was 100 per unit and the bid price was 95 per unit, and a premium of 
1000 was paid, the purchaser would own 10 units of the fund, which would have a 
value of 950. So the allocated premium would be effectively 950. 


(c 
(d 


— 


e,, an additional flat amount that is charged each period for expenses. 


wm 


Cp, the cost of insurance rate. This is just a mortality rate which will apply to the time 
period running from time k to time k + 1. Fora yearly calculation with an issue age of 
x, we would have c, = q,,,, for an appropriate life table. For a monthly calculation, 
assuming UDD, we would have c, = (1/12)q,4,,, where 12m < k < 12(m + 1). It is 
often expressed as a rate of so much per 1000. So for example a cost of insurance rate 
of 15 per 1000 would mean that c = 0.015. 


(e) iz, the credited interest rate. This may be specified directly or tied to the performance 
of some other investments. 


(f) bz, the death benefit paid at time k + 1 for death between time k and k + 1. 


Then the recursion is just 
AVy4, = (AV, + m — ry) — ep + ig) — en. (13.1) 


For type A policies we have 1, = b, — AV, ,, if this is positive, and we solve the equation 
for AV,,,. For type B policies where the death benefit is paid in addition to the face amount, 
Ng = b,. Another possible type of arrangement is where the fund itself is returned at death, 
subject to a guaranteed minimum amount b;. (As mentioned above, this may be ruled out in 
certain jurisdictions by taxation requirements.) In such case if AV,,, = b,, we would have 
ng = 0, while if AV,,; < by, we would have g, = b, — AV k41- 

It must be kept in mind that the account on a universal life policy should be viewed 
as belonging strictly to the particular policyholder, so while the account value is similar in 
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nature to a retrospective reserve, it is not quite the same. For example, withdrawal rates and 
surrender values are not taken into account in this calculation as was done in some Chapter 12 
formulas. To do so would mean that policyholder accounts would be augmented with positive 
gains from surrender, but these belong strictly to the insurer (as of course do surrender losses, 
should they arise). 


Example 13.1 A universal life policy has a stated death benefit of 100 000, monthly 
premiums of 5000, monthly expenses of 100, and a further monthly expense charge of 4% of 
each premium. The annual credited interest rate is 8%, and the monthly cost of insurance rate 
in the 5th year of the policy is 30 per thousand. If the account value at the end of the 52nd 
month is 50 000, find the account value at the end of the 53rd month, assuming that the policy 
is (a) type B and (b) type A. 


Solution. We first must calculate the applicable monthly interest rate which is (1.08)!/!? — 
1 = .00643. Then 


(a) AV53 = (50000 + 5000(0.96) — 100)(1.00643) — 0.03(100 000) = 52 051.72. 


(b) The 100 000 above is replaced with 100 000 — AV 55, and solving the equation we just 
obtain the Type B amount divided by (1 — 0.03). 


AV 54 = 52051.72/(1 — 0.03) = 53 661.57. 


Another calculation that will normally be done each period is to compute the cost of 
insurance (as opposed to the cost of insurance rate). This is sometimes abbreviated as COI,, 
for the period running from time k to k + 1. (Some authors prefer a different indexing method 
and would refer to this as COI,,,. We prefer to have the index correspond to that of the cost 
of insurance rate c,.) This is normally determined as at the beginning of each period, so if 
there are not sufficient funds to pay it, the policy would lapse in the absence of any guarantees 
to the contrary. In effect it can be viewed as the net single premium paid each period for the 
death benefit coverage. That is 


COI, = (1 + iy) ley. 


So in the above example we would have 


COL, = (1.00643)-!(0.03)100 000 = 2980.83, for the type B policy and 
COL, = (1.00643)-1(0.03)(100 000 — 53 661.57) = 1381.27, for the type A policy. 


There are some variations which can arise. In some circumstances the insurer may use a 
discount rate j in calculating the cost of insurance which is different from the credited interest 
rate i,. In this case formula (13.1) would be modified to 


AV = [AV, + nz i Ty) — €, — (1 tj) cmd + iy) (13.2) 


which shows directly the insurance costs being deducted at the start of the period. Clearly 
when j = ig, the interest factors cancel and we just get back the same formula as before. 
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Notice however that in general we still have the same formula as before if we adjust the c}. 
We change this to a new cost of insurance rate given by 
ER lai 
EE p 


The following illustrates. 


Example13.2 Redo Example 13.1 and calculate the cost of insurance, under the assumption 
of a 396 discount rate for the cost of insurance. 


Solution. We can do all calculations by changing c, from 0.03 to e = 0.03(1.08)!/17/ 
(1.03)!/!? = 0.03012. In the type B policy, this will change the value of AVs53 to 
52039.72 and COIs, to (1.00643)7!(0.03012)100 000 = 2992.76. In the type A policy, 
this will change the value of AV53 to 52039/(1 — 0.03012) = 53655.83 and COIs, to 
(1.00643)-! (0.00301)(100 000 — 53 655.83) = 1386.96. 


Another modification may be necessary because of the corridor requirement. In a close 
situation, we must test whether an account value is sufficiently low to meet this restriction. 
Let cor, denote the minimum allowable ratio of 5,./ AV, ;. If AV, cor, > by, we must redo 
the calculations. The death benefit will now be cor, 7, ; and 5, = (cor, — 1)AV z411- 


Example 13.3 For the policies of Example 13.1, find AV53; and COI, assuming that 
cors3 = 1.9. 


Solution. For the type B policy the corridor requirement is clearly satisfied and the answer 
will remain as calculated above. 

For the type A policy, we check the original answer to see that 53 661.57(1.9) = 101 957 
which is above 100 000, so we must recalculate: 


AV53 = (50000 + 5000(0.96) — 100)(1.00643) — 0.03(0.9AV 53), 
and solving we get AV53 = 53 604.40. The death benefit will increase to 101 848.36 


COI, = 1.00643710.03(0.9 x 53 604.62) = 1438.07. 


13.2 Variable annuities 


A similar attempt to provide additional flexibility has become popular for certain deferred 
annuity contracts, which have come to be known as variable annuities. They correspond to the 
contract described in Example 5.8 to the extent that there is no survivorship accumulation prior 
to the annuity payments. In fact they operate in the same basic manner as we have described 
above for Universal Life, except that no deductions are made for the cost of insurance. In fact 
some purchasers use these as a vehicle to accumulate money in a high yielding investment 
account without any intention of converting the funds to an annuity. The annuity aspect enters 
into the contract however, since it common to include provisions providing for the conversion 
of the funds into annuities at guaranteed rates of interest and mortality. 
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As in the insurance case, an individual’s account may be invested in particular assets or it 
may be equity-indexed, with the credited interest tied to some reference portfolio. 

Similarly to Universal Life, their are several types of possible guarantees that have been 
designed to reduce the losses under bad investment experience. A common provision is 
a guaranteed minimum death benefit, whereby the account holder is promised a certain 
minimum return upon death, regardless of the account value. The death benefit amount could 
be the amount originally invested, or it could be that amount together with a certain fixed 
accumulation rate. A similar type of provision is the guaranteed minimum accumulation 
benefit whereby the end of a specified period, the account holder is promised a minimum 
account value, regardless of the investment experience. Still another option is the guaranteed 
minimum withdrawal benefit. Under this option the account holder is allowed to withdraw a 
minimum amount (or a minimum percentage) from their account, each year, until the original 
invested value is returned. Evaluation of these benefits requires a knowledge of option pricing. 
An introduction to this topic is found in Chapter 20. 


13.3 Pension plans 


Pension plans are set up by companies to provide retirement income to a group of employees. 
This is a vast subject and we confine ourself here to providing a survey of principal features 
and definitions. Pension plans can be classified into two main categories, defined benefit 
plans, abbreviated as DB, and defined contribution plans, abbreviated as DC, as introduced in 
Section 4.6.2. We now provide more detail. 


13.3.1 DB plans 


The usual type of DB plan provides that an employee will receive a life annuity, beginning 
at a specified normal retirement age, with a periodic payment K. Rules for calculating K are 
specified at the outset. It will normally depend on the employee’s salary and years of service. 
The type of annuity is often a whole life annuity, although there may be options to elect a 
guaranteed period, or a joint-life annuity with another individual, such a spouse. The income 
of course will be adjusted according to the nature of the annuity selected. 

A typical formula is that K will be equal to r times the average of the employee's last 
h years of salary times the number of years of service, for some specified r and h. As an 
example, suppose that the normal retirement age is 65, r = 0.02, h = 3 and that K is an annual 
payment. Consider an employee who is hired at age 30, retires at age 65, and whose last 
3 years of salary were 89 000, 93 000, 100 000. The employee would then receive an annual 
pension of 0.02 x 35 x 94000 = 65 800. 

A measure which is used to compare various plans is known as the replacement ratio 
and it is simply the ratio of the pension to the final year's salary. In the above example the 
replacement ratio would be 65.8%. 

The quantity h will often be between 3 and 5 years. A possible variation involves plans, 
known as career average earnings plan where h is not fixed, but is equal to the complete 
number of years of service for each employee. 

To evaluate benefits on such a plan the actuary will use an investment discount function v 
and a multiple-decrement table which will typically show decrements of disability, withdrawal 
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from service, death and retirement. An additional necessary tool is a so-called salary scale S, 
defined as follows. For some minimal age xo, we have S,, = 1 and then for y > x9. 


S, = the expected ratio of salary at age y to salary at age xo. 


The scale is then used to estimate future salaries. For example, if an employee is earning 
an annual salary of J at age x, then an estimate of their annual salary as age x + h will be 
J. A S. 

Pension plans must specify what benefits if any will be paid to those who leave the 
employee group for reasons other than retirement at the normal retirement age. Such reasons 
include disability, death, withdrawal from company employment and retirement at an age 
other than the normal one. We focus now on early retirement. Plans may specify that a person 
may receive pension benefits if they retire early, with some minimum criteria specified, which 
can depend on age and duration of employment. Early retirement means of course that the 
pension is paid for a longer period, and also that contributions into the plan will not be made for 
the remaining time to the normal date, so typically the pension income will be appropriately 
reduced. Rather than calculate the applicable amount of reduction in each case, it is usual to 
work out approximate figures, known as actuarial reduction factors and these are specified 
as part of the provisions of the plan. 

In the following simplified example, provision is made for early retirement up to 2 years 
before the normal date, but not for other forms of decrement. 


Example13.4 Consider a DB plan which provides a yearly pension of 2% of the final 3-year 
average salary, times the number of years of service, for retirement at ages 63—65, but with a 
reduction of 596 per year should retirement occur before 65. An employee now 45 was hired 
at age 35. His/her present salary is 70 000. The salary scale is given by S, = 1.03*70, x > 30. 
(This simply means that salaries are expected to increase by 396 per year.) An employee may 
retire at any age from 63 to 65, with a reduction in pension income of 596 per year. Find a 
formula for the actuarial present value of the pension benefits. 


Solution. For a person retiring at age 63, the estimated 3-year final average salary is given 
by 


70 000[1.03!> + 1.03!6 + 1.03!7]/3 = 112362 


so that the estimated annual pension income, accounting for the actuarial reduction factor of 
10%, is given by 0.90 x 0.02 x 28 x 112 262 = 56630. 
For person retiring at age 64 a similar calculation, now using a 5% reduction results in 
annual pension income of 63 769, and for a person retiring at age 65, the figure is 71 523. 
The actuarial present value is then given by the following (the subscript r denoting the 
retirement decrement). 


56 630, gr, q0) gs + 63 76919 p( qe) digg + 7152359 p 4L? ügs. 


Our solution here incorporates several simplifying assumptions. We have assumed that 
the pension would be paid annually, instead of the more usual monthly arrangement. We have 
also assumed that all employees are hired and retired on their birthdays, and that each salary 
increase occurs on birthdays. It is not too difficult to incorporate more realistic assumptions, 
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using the fractional duration techniques in Chapter 7. One common provision is to assume 
hiring, retirements, and salary increases at the middle of each year, that is at ages x + 1/2, 
where x is an integer. 


Funds to provide the income could be provided solely by the employer, or more commonly 
shared between the employees and the employer on some specified basis. A usual provision is 
that the employees will contribute each period a amount of c times their salary for that period. A 
typical figure might be c = 0.05. The employer will then contribute additional amounts which 
are estimated to be sufficient to provide the promised benefits. There are various methods for 
doing so. The subject of funding DB pensions is complex, and will not be discussed further. 


13.3.2 DC plans 


At one time the DB arrangement was the most common one. However periods of low interest 
rate earnings and improving mortality meant that many employers needed to put in more 
money then originally estimated in order to ensure the promised level of benefits. In some 
cases this became prohibitive, resulting in firms switching to the DC mode. In such a pension, 
the employees contribute a certain percentage of their salary, the employer adds an additional 
percentage, and the funds are invested and accumulated as an individual account for each 
participant until retirement. At the time of retirement the total accumulated contributions 
made on the employee's behalf are used to purchase an annuity at the then prevailing interest 
and mortality rates. As with the DB plan there is often a targeted goal, but this is not guaranteed. 
If the investment experience is unfavourable, or mortality has improved so that the cost of life 
annuities go up, the pension income may be short of the projected amount. Of course, things 
can go the other way in a DC plan. Very favourable investment returns can result in higher 
pensions than expected. 

Funding arrangements are usually much easier for the DC plan than in the DB case. A 
common practice is to deposit into the account a certain fraction c of a employee's salary 
each year, which could be shared in some way between the employee and employer. One does 
not have to worry as much about benefits for withdrawal or early retirement. Withdrawing 
employees can be given the amount in their account, either in cash or as a deferred annuity. 
For those retiring early the amount can be used to buy an annuity starting on the normal 
retirement date. 

In the following examples, we continue to make the simplifying assumptions noted above, 


Example 13.5 For an employee hired at age 35 at a salary of 50 000, it is estimated that 
an amount of 500 000 is needed at at the normal retirement age of 65 to buy an appropriate 
pension. For exit from the plan before age 65 the accumulated amount of contributions with 
interest is returned to the employee. Find a formula to calculate the contribution rate c, 
assumed to be made at the beginning of each year. 


Solution. The idea is similar to Example 5.8. We solve the following equation to determine c. 
500 000 = 50000 c Valz (g; v) (13.3) 
where 


Ek = 83544/ $35. k=0,1,...,34. (13.4) 
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There are many variations in practice. Many plans have a so-called vesting requirement, 
which means that an employee does not receive back the employer’s contributions unless they 
remain in the plan for a minimum period. 


Example 13.6 Suppose that in the situation described above, the contributions are split, 
with the employee paying 40% and the employer the remaining 60%. For participants exiting 
the plan after 5 years, the total of all contributions with interest is returned at the end of the 
year of leaving. For those exiting in the first 5 years only the employee’s own contributions 
with interest are returned. Find an equation to determine the contribution rate c. 


Solution. We use a variation of formula (5.4). 


500 000v(35),s pf" 


dg - AS G) 


where g is as in (13.4) and 


l 5 Y 1905 0524/84. - ORR, 
J I 
oan i)S354i/ S35 if5 < k< 25. 


Exercises 


13.1 


13.3 


A universal life policy provides a death benefit at the end of the period of death equal 
to the account value, but subject to a minimum of 30 000. Monthly premiums are 2000. 
There is a monthly expense charge of 50 and a further charge of 3% of each premium. 
The annual credited interest rate is 6%. The account value at the end of the 30th month 
is 27 318. The monthly cost of insurance during the third year of the policy is 15 per 
1000. Find AV4; and COL. 


A universal life policy provides a death benefit at the end of the period of death of 
100 000. The account value at time 20 is 2500. For the time period from time 20 to 
time 21, the credited interest rate is 0.006, the expense charge is 596 of any premium 
paid plus 100, and the cost of insurance rate is 30 per thousand. 


(a) Suppose that at time 20, the policyholder makes the minimum premium payment 
of 500, which according to the guarantee in the contract ensures that the cost of 
insurance will be paid regardless of the amount in the account. Find the account 
value at time 21. 


(b) If the contract does not carry the secondary guarantee of part (a), what is the 
minimum premium that the policyholder will have to pay at time 20 to ensure that 
the cost of insurance can be paid. 


At a certain time k, two universal life policies have exactly the same account value, 
death benefit, premiums, expenses and cost of insurance rates. In both cases the cost 
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13.5 


13.6 
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of insurance discount rate is the same as the credited interest rate. One policy is type 
A and the other is type B. Show that the difference between the type A and type B 
account values at the end of the year is the same as the difference between the type B 
and type A COI for that period, accumulated with interest to the end of the year. 


A universal life policy provides a death benefit equal to the account value plus 50 000. 
Monthly premiums are 3000. There is a monthly expense charge of 100 and a further 
charge of 4% of each premium. The annual credited interest rate is 8%. The account 
value at the end of the 73rd month is 49 200. The monthly cost of insurance for the 
following month is 20 per 1000 and is computed at an annual interest rate of 4%. There 
is a corridor requirement which specifies that the death benefit must be at least twice 
the account value. Find AV7,4 and COL. 


Redo Example 13.4, only assume salary increases of 4% per year, and an actuarial 
reduction factor of 3% per year. 


Redo Example 13.6, only assuming that an employee who dies during the first 5 years, 
receives at the end of the year of death, both their own and the employer’s contribution 
accumulated with interest. 


An employee starts employment in a firm at age 35 and is offered a choice of either 
a DC or a DB plan. In the DC plan, contributions of 15% of salary are accumulated 
with interest at 4% until age 65 and then used to buy a life annuity. Under the DB plan, 
the employee is given an annual amount, beginning at age 65 of 1.8% of their 3 years 
average salary times the number of years of service. If the cost of a 1-unit life annuity 
at age 65 is 10, and salaries increase by 4% per year, find the ratio of the annual income 
under the DC plan to that of the DB plan, for an employee who remains in the plan 
until age 65. 


A DB plan provides for an annual pension of a certain percentage of the final 5 years 
average salary times the number of years of service upon retirement. Retirement is 
based on the 80 factor which means that an employee can retire at any time that the 
sum of age plus years of service is greater than or equal to 80. An employee age 55 who 
began work at age 25, is comparing their income for retirement now, to that which they 
would receive if they stay for another year. They can expect a 5% increase in salary 
for the following year, which is the same that they received in each of the last 4 years. 
Find the ratio of the pension income for retirement in 1 year to that for retirement now. 


Part II 


THE STOCHASTIC LIFE 
CONTINGENCIES MODEL 


14 


Survival distributions and 
failure times 


14.1 Introduction to survival distributions 


Our goal in this part of the book is to introduce a stochastic model for mortality to replace 
the deterministic model used in Part I. This will not only provide us with a more realistic 
description of human mortality, but it will also have more general applications. 

The basic information a prospective issuer of an insurance or annuity contract wants to 
know is how long the life in question will live. The insurer obviously cannot hope to answer 
this question exactly, since the actual future lifetime lived is random. Some people age 50, 
for example, will live another 40 years or more, while others will die very soon. In the 
deterministic model, we circumvented this issue by assuming that while we could not identify 
how long a particular individual would live, we could identify how many individuals of a 
given age would live to some other age. Clearly, however, the number of such individuals is 
also random. In the stochastic model we will face this randomness directly. 

This and subsequent chapters will require a more advanced knowledge of probability than 
we have assumed so far. We follow the notation and terminology of Appendix A. For the 
present chapter, see in particular Sections A.4—A.8 and note that P will denote probability. 

We do not need to confine ourselves to looking at the time of death of an individual. 
Suppose we are interested in some event that will occur once and only once at some random 
future time. We will refer to this event as ‘failure’. The random variable T, the time of 
occurrence of such an event, is known as a failure time, and its distribution is often referred 
to as a survival distribution. At any time before ‘failure’ we will say we are in a state of 
‘survival’. 

Our motivating example is the case where the event in question is the death of (x). In this 
case we will denote T by T(x). 
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There are, however, many other examples of interest. Suppose a manufacturer sells a 
product with a guarantee that it will be replaced it if fails before a certain time. In order 
to assess the cost of the guarantee the manufacturer wants to know the distribution of the 
product’s failure times. We see that a guarantee can be viewed as type of insurance policy. 

We have already encountered more general failure times in Chapter 10. Joint-life insurance 
and annuities were seen to be the same type of contracts as those for single lives except that 
the failure time was defined to be the time of first death. Similarly, for last-survivor annuities 
or second-death insurances, the failure time was the time of second death. Multiple-decrement 
theory provided many more examples of failure times. 


14.2 The discrete case 


Consider the case where failure can occur only at integer times 1, 2,3,..., so T is a discrete 
random variable with positive integers as values. Refer in particular to Section A.4. In place of 
the cumulative distribution function F, it is often more convenient to use the survival function 
s defined by 


s(k) = 1— F(k) = P(T > k). 


This gives the probability that failure has not yet occurred by time k, or in other words that 
we are still in a state of survival at time k. If f is the probability function of T, it is clear from 
(A.5) that 


co k 

fü)esk-0)-sQ sh= Yf02z1-Yfo. (14.1) 
i=k+1 i=l 

There is another important method of describing the distribution of 7. 


Definition 14.1 The hazard function of T at time k, denoted by A(K) is the condition proba- 
bility of failure at time k given survival up to time k — 1. That is, 


_ KK) 
^ s(k— 1) 


A(k) 


This of course is defined only for those integers k such that s(k — 1) > 0. Once s(k — 1) 
equals 0, there is no possibility of survival up to that point. (The hazard function is also known 
as the hazard rate or intensity function or failure function.) 

Readers should make sure that they understand the difference between f(k) and A(k). Both 
quantities give the probability of failure at time k, but from different perspectives. If, at time 
0, we are asked to assess the likelihood that failure will occur at time k, the answer is just f (k). 
As time goes on, our assessment of this likelihood must change. For example, if failure takes 
place before time k, then we know that the probability of failure at time k is zero. Suppose 
that at time k — 1, failure has not yet occurred, and we ask ourselves the same question. The 
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answer now is A(k). The difference between the two functions is further clarified by writing 
the definition of 4 in the form 


fk) = slk — 1)a(k), (14.2) 
expressing the fact that we view failure at time k as arising from two events. First, there must 
be survival up to time k — 1, and then, given this survival, failure must occur at time k. 

In deriving an appropriate survival distribution, it is often easier to model the hazard rate 
rather than s or f. Given A, we can then easily determine the other functions as follows. 
Equating the two different expressions for f(k) given in (14.1) and (14.2), 

s(k) = s(k — 1)[1 — A(K)]. (14.3) 


Beginning with s(0) = P(T > 0) = 1, we know that s(1) = 1 — A(1),s(2) = s(D[1 — A2)] = 
[1 — A(D)][1 — 4(2)], and proceeding inductively, 


s(k) = [1 — A(DIEL — AQ)] -- [1 — AD). (14.4) 
Example 14.1 A bag contains 3 green balls and 1 red ball. A ball is drawn at random. If 


green, it is replaced and the draw is repeated. Failure occurs when a red ball is drawn. If this 
is on the kth draw we say that failure occurs at time k. Find s(k), f(k) and A(K). 


Solution. From the conditions of the problem we can immediately deduce that 


1 
Ak) ==, k=1,2,.... 
UU 


From (14.4), 
3 k 
k - (1) a n 
s(K) 1 
and from (14.2), 
3-1 
f(k)- ue ko02; 


We have the well-known geometric distribution. See Section A.11.3. As a check we can also 
compute f(k) from the first formula in (14.1), which of course yields the same answer. 


14.3 The continuous case 


In most applications, the time of failure is not restricted to the integers but can be arbitrary. 
We model this by assuming that T is a continuous random variable with values in [0, oo). 
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14.3.1 The basic functions 
We define the survival function s as in the discrete case, 
st) = 1- F) = P(T > t), 
the probability of survival up to time t. It is related to the probability density function (p.d.f) 


by 


co t 
f(0-2-s(D, s= J f(ndr =1- / fdr. (14.5) 
t 0 


Definition 14.2 The hazard function of a continuous failure time T at time t, denoted by 
H(t), is the continuous density function for failure at time t given survival up to that point. It 
is given by 


fO 
H(t) = sq) 
for all t such that s(t) > 0. 


For small At, L(t)At approximates the probability that T takes a value in the interval 
[t,t + At] given survival up time t. Analogously to (14.2), we have the expression 


f) = s()uQ). (14.6) 
To determine the other quantities from 4 we note from (14.5) and (14.6) that 


ERO. LO D 


HOS DUC udi 


Following the proof of Proposition 8.1, we deduce that 
s(t) = e- h war. (14.7)1 


Formula (14.7) seems quite different from its discrete counterpart (14.4), but if you look 
at it in the right way, it really is a natural continuous version. Recall that e-X«i = IIe“. 
So if we think of an integral as a type of generalized sum, then e to the integral is a type 
of generalized product of terms of the form e ^", which for ‘small’ values of are close 
to 1 — yp. 


14.3.2 Properties of u 


We know that the density function f must be nonnegative and satisfy Io. if (tdt = 1, and that the 
distribution function F must be nondecreasing and satisfy F(0) = 0, lim,_,,, F(t) = 1. The lat- 
ter implies that the survival function s is nonincreasing and satisfies s(0) = 1, lim, „œ s(t) = 0. 


EXAMPLES 215 


What are the corresponding properties for the hazard function? The hazard function u must 
be nonnegative on its domain and in addition satisfy 


rn H(t)dt = oo. 
0 


To see this, suppose that the above integral had a finite value a. From (14.7) we would 
deduce that lim, ,,, s(t) = e ^ which is not equal to 0, contradicting the fact that failure must 
occur at some time. In other words, if the integral is finite, the hazard function is not large 
enough to guarantee failure and there would be some chance of surviving forever. Of course, 
we may want to model situations where there is a chance that failure will never occur, and in 
this case we would want the above integral to be finite. 


14.3.3 Modes 


The general shape of the density function of T can often be inferred by looking at u. In 
particular, we may be interested in modes, which are points where the density function assumes 
a local maximum. We assume that f is differentiable, and note from (14.5) and (14.6) that 


f= Guy o s'u  su' = -fu + su! = -suu + su! = s(u! — p°). (14.8) 


We see then that f is increasing or decreasing according as y/’ is greater than or less than 4/2. 
Points at which modes can occur are restricted to or those values of t for which u'(t) = u(t)’, 
or possibly to endpoints of the domain. We will look at some particular examples later. 


14.4 Examples 


In this section we survey some familiar distributions that can be used to model failure times 
in certain situations. 

The family of exponential distributions (see Section A.11.6) can be easily described as 
those with a constant hazard function. If w(t) = p for all t, then from (14.6) and (14.7), 
s(t) = e^", Such a distribution is suitable for modelling situations where the chance of failure 
in the next instant remains constant regardless of the time. This is shown by the constancy of 
the hazard function as well as the property given in A.57. It is not suitable for modelling human 
mortality, where the aging process means that we would expect the hazard function to increase 
with time. See Section 8.10.1 where we discussed the same point in the deterministic model. 

Another overly simplistic approach is to take a continuous uniform distribution (see Sec- 
tion A.11.4). For most applications, this is not realistic, but it has been used as a rough approx- 
imation to model human mortality. This is simply the stochastic version of Demoivre's law, 
introduced previously in Chapter 8. Note that it does give an increasing hazard function since 


x0 . Do impr 


Hi) sA N-r 7 


The first serious attempt to capture some of the relevant features for mortality was the 
Gompertz distribution given by 


u(t) = Bc’, for some parameters B > 0 and c > 1. 
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This was introduced by Benjamin Gompertz in 1825. His idea was that the hazard function 
for human mortality should increase with age at a rate proportional to itself, that is, at an 
exponential rate. It has been used extensively for modelling mortality in both humans and 
other species. For human mortality, a good fit can usually be found over the middle span of 
ages by choosing the parameters appropriately. 

One feature of this distribution is that y’(f) — u(t)” = u(t)|log c — u(t)] is positive, zero, 
or negative, precisely when y(t) is less than, equal to, or greater than log c. Then, (14.8) shows 
that if 4(0) > log c, the density function is decreasing, while if (0) < log c, there is a unique 
mode at the point ? where u(t) = log c. The density function increases up to this point and 
then decreases. This does indeed seem to fit the pattern of human mortality over the middle 
range of ages. Starting about age 30, the probability of dying at age x will increase with x up 
to a certain point. There is clearly more chance of dying at age 70 than at age 30. However, 
at sufficiently high ages the chance of dying starts to decrease with age. For example, there is 
a very small chance of dying at age 110, for the simple reason that very few people will live 
that long in the first place. It may be enlightening to look again at (14.6), which expresses f 
as the product of two functions, one of which is decreasing, and one increasing. 

The Gompertz distribution does not fit well with observed mortality at ages below 30 
or above 70 or so, for the reasons explained at the beginning of Section 8.10.1. In 1860, 
a modification was advocated by Makeham. He suggested that a constant be added to the 
hazard function to cover causes of death that were age-independent, such as accidents. The 
Makeham distribution, often referred to as the Gompertz-Makeham distribution, is then given 
by u(t) = A + Bc, for some A > 0. The extra parameter allows for more flexibility in shape 
(see Exercise 14.12) but still does not capture observed mortality behaviour at very young or 
very old ages. 


14.5 Shifted distributions 


Suppose we are given a failure time T. (For convenience we will deal with the continuous 
case, but the conclusions for the discrete case are similar.) As remarked above, having reached 
a point u at which failure has not yet occurred, we must alter our assessment of the likelihood 
of failure at different times. To formalize this, we define a new random variable. 

The random variable T o u is equal to the time until failure occurs, as measured from 
time u, given survival up to time u. Therefore, it takes a value of T — u when T takes a value 
greater than u, with probabilities conditioned on the event that T is greater than u. The survival 
function, density function, and hazard rate function, respectively, for T o u are expressed in 
terms of the corresponding functions for T as follows. (Since we are dealing with more than 
one distribution here, we will insert appropriate subscripts on s, f and m.) 


Spo) = e: (14.9) 
T 
|. d — fru t t) 
frou(t) = GST ou) zw (14.10) 
Mr out) = DEPO ME uru +t). (14.11) 


ST o) 
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Formula (14.10) provides a nice way to visualize this concept. To get the graph of the 
density function of the shifted random variable, we take the graph of the original density 
function, ignore everything to the left of u, and scale all values upward to get the total area 
equal to 1. 

For some distributions, the shifted random variables are of the same family as the original. 
This makes these distributions convenient for modelling purposes. 


Example 14.2 Describe the random variables T o u when (a) T is an exponential random 
variable with constant hazard p, (b) T is uniform on the interval [0, N], (c) T is Gompertz- 
Makeham with parameters A, B and c. 


Solution. In all cases it is convenient to use the hazard function and (14.11). 


(a) Mp S (D = u(u + t) = n, showing that T o u has the same distribution as T. This indi- 
cates the so-called memory-less property of exponential random variables. The time 
until the event in question occurs will not be affected by how long you have already 
waited. 


(b) up, (D = Hutt) = 1/(N — u — t), showing that Tou is uniform on the interval 
[0, N — u]. 


(c) up, (D =A + Bc"c', showing that T ou is also Gompertz-Makeham with the B 
parameter changed to Bc". 


14.6 The standard approximation 


Associated with every continuous failure time T is a discrete failure time T. Failure according 
to T will occur at the integer time k, if failure under T has occurred in the time interval 
(k — 1, k]. To be precise, 


T=[T]+1, where [-] denotes the greatest integer function. 


Let f and s denote the density function and survival function, respectively, of T. Let f and 
$ denote the corresponding functions for 7. Then, for all positive integers k, 


1 
Fk) = i f(k — 14 dt = s(k — 1) — s(k), (14.12) 
0 
3(k) = s(k). (14.13) 


Suppose we want to deduce information about T by observing when failure occurs. It 
could be difficult or impossible to observe continuously and we may be only able to view the 
situation at certain discrete times, say 1, 2, 3, etc. If we observe at time k, and see that failure 
has occurred, we know only that failure occurred between time k — 1 and k. In other words, 
we are observing values of 7. We would like therefore to infer the distribution of T from that 
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of T. Clearly, we cannot do this exactly and must make some type of approximation. A simple 


method, consistent with (14.12), is just to assume that for all nonnegative integers k, and any 
t in the interval (0,1), 


f(k* t) 2 f(k 1. (14.14) 

We will refer to this as the standard approximation. In fact, we have already used this 

approximation in our deterministic model, in the form of the UDD assumption for the random 

variable T(x), as indicated by the equivalent formulation in terms of survival functions, 
namely, 


s(k + f) = (1 — ds(k) + ts(k + 1), (14.15) 


for an integer k and 0 < f < 1. To see this equivalence, note that, given (14.15), we obtain 
(14.14) by differentiating with respect to t. Conversely, given (14.14), we have 


t 
s(k + f) = s(k) — f f(k + r)dr 
0 
t ~ ~ 
= s(k)— | S(k+ Ddr = s(k) — tf(k + 1) 
0 
= s(k) — t[s(k) — s(k + 1)]. 
For the purpose of calculating moments, it is useful to introduce the random variable 
R=T-T, 
the duration from the time of failure until the end of the year of failure. Given a value r in the 
interval (0,1), consider the probability that R < r given that T = k. This is the probability that 
T takes a value between k — r and k, given that T = k. Using the standard approximation, this 


probability is given by 


i j^ 1 fE; 
— | fOd=— far. 


FO Jis FO Je 


This shows that under R is independent of 7 and has a uniform distribution on the interval 
(0,1). Using standard results about the uniform distribution, we can calculate that under the 
standard approximation 


E(T) = E(T) — E(R) = E(T) - L (14.16) 
Var(T) = Var(T) + Var(R) = Var(T) + > (14.17) 


where E denotes expectation and Var denotes variance. 
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14.7 The stochastic life table 


The goal in this section is to establish a link between the stochastic model and the deterministic 
model for mortality. In this and subsequent chapters we will use this link to show that the 
questions that we answered in the deterministic model can be answered in the stochastic 
model in the very same way. In addition, we will be able to answer more questions. 

The tool for doing so will be to return to the fundamental concept of a life table but look 
at it in a different way. As we did before, we start out with an arbitrary number ĉo of newly 
born lives. We now let Z, be the number of these still alive at age x, and D, be the number 
of these who will die between the age of x and x + 1. These quantities look like the Z, and 
d, of Chapter 3, but we now recognize that these are not numbers but random variables. We 
obtain the life table by letting 


0, = E(Z,), d, = E(D). 


Since it is clear that Z,,, = Z, — D,, we have the same formula (3.1) as we had before. In 
other words, in our stochastic model we still can introduce a life table, but we now view the 
numbers as expected values rather than as an exact account of the numbers who will live or die. 

Consider now the random variable T(x), the time to the death of (x). We will write 
S. (f), f(t), u(t) for the survival function, density function, and hazard function respectively 
of T(x). 

For age 0, there is somewhat a different standard notation, which can cause some confusion 
if care is not taken. The random variable T(0) is traditionally denoted by X, and the variable 
denoted by x rather than f, since T(0) is really the age at death. It is common to suppress the 
0 and just write s(x), f(x) and u(x) for so(x), fo (x) and uox), respectively. 

A key observation is that 


0, = Cos(x). (14.18) 


The argument is the same as showing that if you flip N coins, each with a probability p of 
coming up heads, you can expect to get Np heads. In this case, if you take fg people, each with 
a probability s(x) of surviving to age (x), then you can expect to get Z,s(x) such survivors. 

We are now ready for the main purpose of this section, which is to establish a correspon- 
dence between the survival, hazard, and density functions of T(x) and the life table functions 
for the stochastic life table. At first, we assume aggregate mortality, as in Chapter 3, where 
we used a single-life table for the future mortality of all ages x, by just starting at age x. This 
corresponds stochastically to the statement 


T(x) has the distribution of T(0) o x, 


which simply says that the future lifetime of (x) is the future lifetime after time x of a person 

who was age 0, given that they have lived to age x. There are therefore no effects of selection. 
Suppose also that we calculate the quantities ,p,, ,q,. (x) starting from the life table, 

exactly as we did in the deterministic model. We then have from (14.10) and (14.23) that 


s(xt+t) zm 
s(x) m Cy 


s,(t) = = Py (14.19) 
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This is certainly reasonable. Although looked at from different points of view, both of these 
are probabilities that a person age x will survive t years. It follows immediately that 


Ft) = 1- Py = 4y (14.20) 


and that the hazard rate function of T(x) is given by 


d 
H,(t) = a log ,p, = H+ t), (14.21) 


where the last quantity is the force of mortality of Chapter 8. (Note that in the case of hazard 


rates, the notation in the deterministic model was chosen from the outset to correspond to that 
in the stochastic setting.) Finally, 


fO = Pr-O, (14.22) 


the familiar expression that we used for insurance premiums in Chapter 8. 
If we assume the strictly select mortality of Chapter 9, then we cannot assume that T(x) is 
distributed as X o x. The above correspondences would be modified as follows: 


s.) = Pix] 


and 


"RO = Pix] uy). 


It is also instructive to compute the distribution of multiple-life failure times in terms 
of the quantities introduced in the discrete model. For example, consider T(xy), the time of 
failure of the joint-life status. For this random variable, the survival function is just ,p,,, the 
hazard function is 44, (1), and the density function is p, us, (1). 


14.8 Life expectancy in the stochastic model 


Let 7(x) be the discrete random variable associated with T(x) as defined in Section 14.7. It 
will have survival, probability and hazard functions given by 


5,(k) = Po (14.23) 
SAR) = k-1Px — kPx = k-1Px dxek-lo (14.24) 
Ax(K) = dui: (14.25) 


Traditional actuarial terminology defines a discrete random variable K(x) known as the 
curtate future lifetime, which measures the whole number of future years to be lived by (x). It 
is clear that 


K(x) = T(x) - 1. 
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From (14.24), (3.8), (A.12), and the fact that s,(0) = 1, we can write 


@-x-1 


€, = > s,() = EIT] — s,(0) = EIK). 


k=1 


Therefore life expectancy is indeed an expectation in the sense of probability theory. The 
formal definition of complete life expectancy in the stochastic model is 


é, = E[T@)]. 


From (A.15) we see that this agrees with our Chapter 8 definition. Moreover, using the standard 
approximation and (14.16), we verify the fact that 


E[T(x)] =e, + z 


We can already see one advantage of the stochastic model. As well as computing the 
expected future lifetime, we can compute the variance of future lifetime. 


Example 14.3 Given l o0 = 100, Co = 90, fo) = 70, C93 = 40, Ü94 = 10, los = 0, com- 
pute the expectation and variance of T(90), assuming UDD. 


Solution. Using either (A.7) or (A.16) we find that E[7(90)] = 3.1, E[T(90)|* = 10.9, so 
that Var[7(90)] = 1.29. Then, from (14.16) and (14.22), 


bo) = 2.6,  Var[T(90)] = 1.373. 


14.9 Stochastic interest rates 


Stochastic interest is a large topic which we will only allude to briefly. As well as the life 
table, the other main ingredient of the actuaries tool-kit, the discount function, should be 
treated stochastically. For the most part, the method of doing so in the pricing and valuation 
of insurance and annuity contracts has been by a simulation technique. Various scenarios of 
possible future interest rates are selected, together with some weighting as to how likelihood 
their occurrence is. Premiums or reserves are computed using each scenario, exactly as we 
have described, and the totality of results are analyzed. In this way, one can arrive at estimates 
of the expected value of the quantities, or in some cases, particularly for reserve purposes, an 
idea of the worst case scenarios. 

A more sophisticated approach is to model v(s, t) as a random variables rather than definite 
numbers, as we mentioned in Chapter 2. Some aspects of this idea are dealt with in Chapter 20. 
Numerous stochastic interest rate models have been proposed. The most popular approach is 
to express the force of interest ô(f) as a stochastic process rather than a function of f. 
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Notes and references 


Standard actuarial terminology, which is based on deterministic thinking, refers to Gompertz’s 
law, rather than to a Gompertz distribution. This is stated as u(x) = Bc’. In our framework this 
means that T(0) has a Gompertz distribution and T(x) = T(0) o x, so that T(x) has a Gompertz 
distribution for all x. Similar remarks apply to the Makeham modification. 

It was common practice at one point to use life tables that satisfied Gompertz’s or 
Makeham’s law. A major motivation was to simplify joint-life calculations, by reducing 
joint-life statuses to a single-life or a joint-life status with equal ages. See Exercise 14.10. 
There is little need for this with modern computing methods. 

Additional material on Gompertz, Makeham and more general mortality distributions can 
be found in Brillinger (1961), Carriére (1994a) and Tennenbein and Vanderhoof (1980). 


Exercises 


Type A exercises 


14.1 Redo Example 14.1, but now assume that each time a green ball is drawn, it is 
replaced with a red ball. In this case what is the most likely time of failure? 


14.2 Redo Example 14.1, but now assume that each time a green ball is drawn, a new red 
ball is added. 


14.3 A failure time T has the hazard function y(t) = (1 + t) 2. What is the probability 
that failure never occurs? 


14.4 Suppose that qo; = 0.5, qog = 0.6, Goo = 0.8, q100 = 1, and that UDD holds. Find 
(a) E[T(97)] and (b) Var[T(97)]. 


14.5 If Demoivre's law holds with œ = 100, find the variance of T(60). 
Type B exercises 
14.6 Consider two copying machines. Machine 1 will make 20 000 copies per month. 
When new, it will last 7 months, where T is a random variable with hazard function 


1 
t) = ——, 0<t< 40. 
ur) 40-1 


Machine 2 will make 15 000 copies per month. When new, it will last for S 
months, where S is a random variable with hazard function 


2 
f) = ——, O0<1t< 9%. 
ust) 96-1 


A prospective purchaser is trying to decide between buying a new machine 1, or 
a 2-year-old machine 2. Which of these two choices will produce, on average, the 
largest total number of copies during its lifetime? 


14.7 


14.8 


14.9 


14.10 


14.11 


*14.12 


14.13 
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Suppose that T has an exponential distribution with constant hazard 0.2. 
(a) What is the probability that 7 — 3? 


(b) What is the error made if we approximate the probability that 2.5 < T < 2.75 by 
using the standard approximation? 


Consider the distribution with density function f(t) = pte, t>0. (This is a 
gamma distribution with first parameter 2.) 


(a) Show that s(f) = (1 + pde”. 
(b) Find the hazard function u(t). 


(c) Describe precisely the shifted distribution T o u. (Hint: It is a mixture of two 
well-known distributions.) 


For two independent lives (x) and (y), (x) is subject to a force of a mortality of 
H,(t) = 2/(10 — t) while (y) is subject to the survival function 5y(t) = 1 — (t/10), in 
both cases for 0 < t < 10. Find the p.d.f. for the following random variables. 


(a) T(xy), the time of failure of the first death; 
(b) T(xy), the time of failure of the second death. 
(The law of uniform seniority) 


(a) Suppose that Gompertz's law holds (see Section 14.9). Show that there is function 
2, defined on the nonnegative reals, such that 


tPxxtn = tPxegin), for all t > O. 


(b) Suppose that Makeham's law holds. Show that there is a function h, defined on 
the nonnegative reals, such that 


tPx:x+n = tPx+h(n):x+h(n), for all t > 0. 


Show that the sample table of Section 3.7 satisfies Gompertz’s law (that is up to age 
119 where a modification was made to get a finite value of c). 


Suppose that T has a Gompertz-Makeham distribution with A > 0. Show that fy is 
either (i) decreasing, (ii) increasing then decreasing or (iii) bimodal with two local 
maximums. Identify the conditions on the parameters that will result in each case. 


For each of the following two families of distributions, decide whether or not the 
shifted distributions are from the same family, with changed parameters. 


(a) u(t) = a(t + 6)! for positive a and 0 (Pareto); 


(b) u(t) = kt" for positive k and r (Weibull). 


15 


The stochastic approach to 
insurance and annuities 


15.1 Introduction 


In this chapter we deal with a perfectly general failure time T, and develop the stochastic 
approach for calculating premiums and reserves for insurance and annuity contracts based 
on T. These will all be calculated as expected values of appropriate random variables. In 
the particular case where T — T(x), we will show, using the correspondence established in 
Chapter 15, that these agree with the results that we obtained in the deterministic model. The 
advantage of the stochastic approach is that we can augment these expected values with other 
quantities, such as variances. 

Throughout the chapter, f, s and u will denote the density, survival and hazard functions 
respectively of T. We let T be the associated discrete failure time, | T | + 1. We will let f and A 
denote the probability function and hazard function respectively of T. We denote the survival 
function of T with the same symbol s. This will not cause confusion since the value at any 
integer is the same in both cases. 

In some cases the values of T are bounded (e.g., the values of T(x) are bounded by @ — x), 
but this is not necessary. The upper limit on integrals or sums will be written as oo to cover 
all possibilities. 

The approach differs somewhat from the deterministic case and it is important to under- 
stand the distinction. We suppose, as we did before, that we have a fixed deterministic 
investment discount function v. We will consider contracts that pay certain benefits provided 
that failure has not occurred, and/or benefits at the time of failure. We will want to determine 
present values of these benefits, and these will all be computed with respect to the discount 
function v. In this model, we do not have the interest and survivorship discount function 
that we had before. However, the present values that we obtain will not be definite numbers 
but rather random variables, since they will depend on the unknown value that T assumes. 
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Premiums and reserves will be calculated as expectations of these random variables, and 
it is in calculating these expectations that the probabilities of life and death are taken into 
account. These expectations are commonly known as actuarial present values, abbreviated 
as APV. 

Throughout this chapter and the next, we adopt the following notation, which will ensure 
that our theory is applicable both to the net premium situation, as discussed in Chapters 4-6 
and to the gross premium situation as discussed in Chapter 12. The premium symbol m, 
will denote the total inflow received at time k, which will be the net premium in a model 
which ignores expenses, or in the expense situation will be the gross- or expense-augmented 
premium less the expenses paid at time k. Similarly the benefit b, paid for failure at time k 
will include any costs involved with making the benefit payment. However to keep things 
simple we confine our attention to a single decrement case where we have only one cause 
of failure. In particular then, in the context of life insurance we are ignoring withdrawals, 
implicitly making the assumption discussed in the second remark after Example 12.2. 


15.2 The stochastic approach to insurance benefits 


We deal here with contracts that provide benefits upon failure, and consider both the discrete 
and continuous cases. 


15.2.1 The discrete case 


We consider a contract with failure benefit vector b paying b,_, at time k for failure between 
time k — 1 and time k. Let Z denote the present value of the benefits with respect to v. Then Z 
is a function of the random variable 7, since when T = k, the value of Z is by_,v(k). We can 
therefore write 


Z = bs. Aw(T). 
Letting A7(b;v) denote E(Z), we have 
Ag(biv) = V b, OF) = Y bvk + face 1). (15.1) 
k=] k=0 


As before, we will often suppress the v and write this as A7(b). Note that when T = T(x), 
formula (15.29) shows that we obtain A,(b) as calculated in (5.1). 

For many applications we want more information about Z than just its expected value; at 
the very least, we would want to calculate Var(Z). This is easily done, since Zz = b OY. 
In other words, to calculate the second moment, we simply square the benefits, and square 


the discount function. This leads to 


Var(Z) = As(b?; V?) — Ag(b; v}. (15.2) 
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15.2.20 The continuous case 


Consider a contract that pays b(t) at the moment of failure, should failure occur at time t. 
We will let Z denote the present value of the benefits and denote the expected value of Z by 
Arb; v), often shortened to A;(b). When T = t, b(t) is paid at time f, so 


Z = W(T)V(T) 


and 
Aq(b; v) = ra b(t)v(t)f(t)dt. (15.3)t 
0 


When T = T(x), we see from (15.27) that this is the same quantity as we obtained in (8.18). 
The variance is computed as we did in the discrete case: 


Var(Z) = A7(b?; v?) — Aq(b; vy. (15.4)$ 


Example 15.1 Suppose that u(t) is a constant 0.04 and the force of interest is a constant 
0.07. An insurance contract pays a benefit of e®03t at time t, should failure occur at that time. 
Find the expectation and variance of Z, the present value of the benefits. 


Solution. This is an exponential failure time, so f(t) = 0.04e-9.9^' We have b(t) = e099' and 
v(t) = e-99"'. Substituting into (15.3), 


EZ) = f re 0.04¢0.03tg-0.071.-0.041q, Io LA l 
ORE 0.04 +0.07 —0.03 2 


Now b(t)? = e? and v(t)? = e~°-!*", so calculating as above, 


0.04 d 
0.04+0.14—0.06 3” 


EZ’) = Var(Z) = = — 


-— 
9 4 IZ 


15.2.3 Approximation 


Suppose now that we know only the distribution of 7, and wish to approximate Ab). If 
interest and benefits are constant over each year, we can use the standard approximation to 
obtain the same i/ó adjustment as we obtained before for T(x). We will give an alternate 
stochastic derivation of this result. We first assume constant interest, so that v(t) = v! for some 
constant v. Let b, denote the constant value of b(t) over the time interval k to k + 1, and let 
b = (bo, b,, ...). Recall the random variable R = f — T introduced in Section 15.7. Then 


E(Z) = E[b(T)v"] = Ebr") = EDb(Tyv? JE[v-^], 
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where we invoke independence for the last equality. When T = j, b(T) = bj ,, so that 
E[b(T)v"] is just A7(b). Moreover, v-® = (1 + i$, and since R is uniform on [0,1]. 
! i 
Ely ^] f (1+i“du = =, 
0 6 
leading to Ar(b) = (i/5)A;7(b). 
The general formula, where interest can vary from year to year is then given by 
Ay(b) = Ap (5 * p). (15.5) 


where i/6 is now a vector. 


15.2.4 Endowment insurances 


In the deterministic model, endowment insurances were viewed as a kind of hybrid, consisting 
of both insurance and annuity components. In our stochastic model, we can view these strictly 
as insurances. Define the failure time 


T = min(T(x),n), 


so that failure occurs at the death of (x), or at time n if earlier. Insurances based on T are 
precisely n-year endowment insurances. When benefits are paid at the moment of death, T 
is an example of a random variable that has both a continuous part and a discrete part (see 
Section A.5). It is continuous over the interval [0, n), on which it has a density function f, 
satisfying 


b 
P(a<T<b)= / fdt, provided b < n. 
a 
This density will just be the restriction of the density function for T(x) to the interval [0, n), 
and it integrates to F,.(n) over this interval. The remaining probability is all concentrated on 
the point n, since T will take this value whenever T(x) takes a value of n or greater, an event 


with probability s(n). Expectations of a general function g of T must be calculated as a sum 
of two terms, 


E[g(T)] = | BOF (Odt + g(n)s,(n). 


In particular, consider an n-year endowment insurance with death benefit function b defined 
on [0, n]. That is, b(t) is paid at the moment of death if this occurs before n and b(n) is paid at 
time n if the insured is then alive (so that failure occurs at time n). Then 


A7(b) = E[v(T)b(T)] = f b(t)vOf,(Odt + b(n)v(n)s,.(n), 


which is easily seen to agree with the present value obtained in the deterministic model. 
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From (15.4) we calculate the variance of the benefits as 


É 2 2 A : 
n [bOvOP f dr + Bav s.) — [Are]. 


Some care is needed when applying the approximation (15.5). This is only valid for the 
continuous part of the random variable T. In the present example of the endowment insurance, 
Equation (15.5) would be modified to 


Ā b) = Ay ( * b) + b(n)v(n)s(n). 


The last term does not get multiplied by the interest adjustment, since in both the end-of-the- 
year-of-death and moment-of-death cases the final amount of b(n) is paid to survivors at exact 
time n. 


Example 15.2 A two-year-endowment insurance on (x) has benefits of 100 if death occurs 
in the first year or 80 if death occurs in the second year, and a pure endowment of 80 if the 
insured is alive at time two. You are given that q, = 0.2, q,,, = 0.3, and the interest rate is a 
constant 100%. Find the expectation and variance of the benefits when (a) benefits are payable 
at the end of the year of death; (b) benefits are payable at the moment of death. 


Solution. (a) Let Z denote the present value of the benefits in the end of the year of death 
case. As we noted in Example 6.2, we do not even need the value of q,,,. In fact we could just 
notice that Z takes the value 50 with probability 0.2 and 20 with probability 0.8. It therefore 
has a mean of 26 and a variance of 302(0.2)(0.8) — 144. 

(b) Let Z denote the present value of the benefits in the moment of death case. Now we 
do need the value of q,,, since it makes a difference if the person dies in the second year, in 
which case 80 is paid at the moment of death, or if they survive, in which case 80 is paid at 
the end of the year. Begin again with the calculation of E(Z) and E(Z2) but now split into the 
insurance and pure endowment amounts. 


E(Z) = 14.8 + 80(1/4)(0.56) = 14.8 + 11.2. 
EZ) = 596 + 6400(1 /16)(0.56) = 596 + 224. 
Now for i = 1, 6 = log(2) so 
EZ = (=) 14.8 + 11.2 = 32.55 
log2 


Squaring v means that ô is doubled and the i changes to 


(qd -1=2i+č’. 
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which equals 3 when the original value is 1. So, for the second moment calculation we take 
i= 3,6 = log(4), to get 


EZ) = (=) 596 + 224 = 1513.77 
log 4 


Var(Z) = 1513.77 — 32.552 = 454.27 


The variance is much larger in (b), due to the variation in the present value of the benefits, 
depending on when the person dies during the year, which is significant in view of the high 
interest rate. 


15.3 The stochastic approach to annuity benefits 


15.3.1 Discrete annuities 


Consider a contract with annuity benefit vector c that pays c, at time k provided that failure has 
not yet occurred. Let Y denote the present value of the benefits with respect to the investment 
discount function v, and let a7(c; v) (usually shortened to Gz(c)) denote E(Y). (The subscript 
f reflects the fact that in view of the yearly payments, this quantity will depend only on the 
value of f, rather than the exact value of T). 

There are two methods for computing expectations and variances of annuities in the 
stochastic setting. This first has the advantage of simplifying variance calculations. Recall 
that in the deterministic model we often viewed insurances as annuities. It is now helpful to 
view annuities as insurances in a certain sense. (Readers who attempted Exercise 5.22 will 
have encountered this idea previously.) To be precise, we want to express Y as a function of 
T. Let g be the function defined on the positive integers by 


g(k) = co + cv(1)  c9v2) + = + ey (wk — 1). 
We can also write g(k) = á(,c; v) using the notation introduced in Section 2.10.1. 


Suppose that 7 takes the value k. Then payments will have been made at all integer times 
from 0 to k — 1, so Y will take the value g(k). We can therefore write 


Y = g(T), 
and it follows that 
az(c) = Seer (k) (15.6)t 
and 
Var(Y) = Y BPFE) — apo)’. (15.7) 


k=1 
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Expression (14.6) is known as the aggregate payment formula, since we consider the function 
g(k) which is the aggregate present value received for failure at time k. 


Example 15.3 For a 4-year life annuity on (60). The sequence of annuity benefits is 1, 2, 
3, 4 beginning at age 60. You are given qęọ = 0.1, qg, = 0.2, qg; = 0.3 and i = 100%. Find 
E(Y) and Var(Y). 


Solution. Calculate recursively g(1) = 1,22) = 12x > 22,98) 22 3x 1 =2.75, 
g(k) = 2.75 +4 x $ = 325 for all k > 3. 


The distribution of T is given by foo(1) = 0.1, fgo(2) = 0.180, foo(3) = 0.216, 569(3) = 
0.504. We can now calculate 


E(Y) = 0.100 + 2(0.180) + 2.75(0.216) + 3.25(0.504) = 2.692, 
Var(Y) = 0.100 + 22 x 0.180 + (2.75)? x 0.216 + (3.25} x 0.504 — 2.692? = 0.530. 


Remark Note carefully that when the final payment of an annuity is at time kọ, then Y 
takes the value of g(kg + 1) with probability s(kg) since g(k) = (kg + 1) for all k > kọ. 


Remark There is an important point here which is often overlooked. Equation (15.6) 
applied to life annuities says that we can view the present value of a set of cash flows with 
respect to the interest and survivorship function as the expected value of the present value of 
payments received with respect to an interest only function. Some people have mistakenly 
assumed that similarly, the accumulated value of a life annuity is the same as the expected 
value of the payments received, accumulated with interest. This is not true and in fact the 
equality does not hold for values at any time other than 0. We have 


Val,(c, y,) = y (Kk)! à, (c), 


while the expected value at time k with respect to v, of the payments received from c is 
v(k)-à,(c), which will be less than the first quantity for all k > 0. 


The second method to calculate annuity present values is more closely related to what we 
did in the deterministic model. We define random variables 


|l iff>k, 
i= a 
0, ifT <k. 


The contract pays a present value of c,v(k) for each integer k such that T > k, or equiva- 
lently, such that J, = 1. We can therefore write 


Y = Y ew. (15.8) 
k=0 
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This is analogous to viewing the annuity as consisting of several pure endowments. Since 
E(1,) = s(k), taking expectations, gives 


oo 


apc) = Y cv dos. (15.9) 


k=0 


Expression (15.9) is known as the current payment formula, since it looks at each payment 
received, discounts it with interest, and multiplies it by the probability of receiving it. In the 
case where T = T(x), this is precisely the formula obtained in (4.1). 

The equality of the expressions in (14.6) and (15.9) can be seen from (A.16). (Condition 
(A.14) will almost always hold. It is automatic when benefits are positive since g will be 
increasing, or when there are finitely many benefit payments, even if negative, since g will be 
bounded.) 

Note that the 7, are not independent. To calculate variances from the current payment 
approach we use the following facts. 


Var(1,) = s(kK)(1 — s(k)). (15.10) 
For j < k, we have I], = I;, so that 
CovQ;, 1) = EQ) — EEU = sU. = s). (15.11) 
To simplify notation let 
ry = c,v(K)s(k), uy = c,v(kyY(1 — s(k)). 
Then from (A.24), (15.10) and (15.11), 


Var(Y) — 2 ru, 4-2 2. Tj. (15.12) 


k j«k 


This will normally be a much more involved calculation than that given by the aggregate 
payment approach. 


15.3.2 Continuous annuities 


Consider a contract that makes payments continuously at the annual rate of c(t) at time t, 
provided T has not yet occurred. We will denote the actuarial present value by a7(c; v) (often 
shortened to a7(c)). 

Define the function g by 


t 
a= f c(r)v(r)dr. 
0 


If T fails at time s, the present value of the benefits received will be precisely g(s), so that 


Y-z(T). 


232 THE STOCHASTIC APPROACH TO INSURANCE AND ANNUITIES 


It follows that 
dr(c) = EF) = f j gf (dt (15.13) 
and 
Var(Y) — J FOSO- (a; f. (15.144 


For the alternate current payment formula we use (A.15). From the fundamental theorem 
of calculus, g’(t) = c(f)v(t) and, moreover, g(0) = 0. If (A.14) holds (which as in the discrete 
case will almost always be the case) we can write the current payment formula 


aj(c) = f: c(t)v(t)s(t)dt, (15.15) 
0 


which agrees with (8.9) when T = T(x). There is no convenient variance formula analogous 
to (15.12) for continuous annuities, since we do have the representation as a sum of indicator 
random variables that was present in the discrete case. 


Example 15.4 An annuity provides continuous benefits at the rate of 1 per period, provided 
failure has not occurred. The failure time T has an exponential distribution with constant 
hazard u. The force of interest is a constant ô. If Y is the present value of the benefits, find 
Var(Y). 


First solution. Since 


We have 
E) me = eor 2 u eo 
EY )= Í > | pe" dr = = | [1 — 2e7®" + e7? ]e de 
0 ô 62 0 
ss (P| A as ie goal -—— — 
|u wté wt+26 (u+ 6)(u + 26)" 


From (15.13), or simply referring back to Section 8.10, we see that E(Y) = 1/(u + 6) so that 


2 1 H 


EUIS TOE ar rD ET 


Note that, as a check, for 6 = 0, this reduces to yu, the variance of the given exponential 
distribution (see Section A.11.6) which must be true since Y = T at zero interest. 
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A simplification. We can simplify and also obtain a more general formula by the following 


trick: 
Y _ 1- 2e-óT Ts e 26T E 2 j= eT t= e72ôT 
7 82 E ó 26 


—óT 


Upon taking expectations and recalling that E[e 
above. 


] = u/(u + 4), we get the same answer as 


*15.4 Deferred contracts 


Consider any contract based on the failure time T, in which no payments are to be made in 
the first m years regardless of whether failure occurs or not. A typical example is a deferred 
annuity, but this can be a perfectly general contract, continuous or discrete, including death 
benefits, annuity benefits or both. Let Y be the present value of the benefits, and let Y' be the 
value of the benefits at the starting time m. As mentioned previously, there is no real need for 
any special mathematical treatment of such, as we simply take zero entries or zero values in 
the benefit vector or function. It is sometimes convenient, however, to express Var(Y) in terms 
of Var(Y’). The purpose of this section is to derive such a formula. 

The idea is to notice that Y takes the value 0 with probability 1 — s(m) and v(m)Y' with 
probability s(m). It follows from basic probability theory that 


E(Y) = s(m)E(v(m)¥") = v(m)s(m)EQ'), 


a formula that we are familiar with from the deterministic model. 
It similarly follows that 


E(Y?) = vim)2s(m)E(Y"”) = v(m)2s(m)[Var(¥") + EY]. 
Substituting for E(Y) in terms of E(Y’) in the expression Var(Y) = EY?) - E(YY gives 
Var(Y) = v(m)*s(m)Var(¥’) + v(m)? s(m)(1 — s(m))E(Y")”. (15.16) 
This gives a familiar decomposition of the variance of Y. The first term measures the 
uncertainty arising from actual benefits once they start at time m, and the second term 


measures the uncertainty arising from the fact that failure may or may not occur prior to 
time m. 


15.5 The stochastic approach to reserves 
We will illustrate the stochastic approach to reserves with the discrete model. The results are 


easily adapted to the continuous case. We have vectors b and z, where b, is paid at time 
k + 1 if failure occurs between time k and k + 1, and z; is paid at time k if failure has not yet 
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occurred. We are adopting the point of view (used before in Chapter 6) of treating annuity 
benefits as negative premiums, so that z, represents a net inflow at time k, consisting of the 
premium received, less expenses paid at time k if these are being considered, and also less any 
annuity payments made at time k. We will therefore not need to refer to a separate annuity 
benefit vector. 

Fix an integer duration r and suppose that failure has not yet occurred by that time. 


Definition 15.1 The prospective loss at time r, denoted by ,L, is the value at time r of the 
future net cash flows to be paid out. 


These net cash flows are the future benefits to be paid minus the future net inflows to 
be received. The definition here is different than that used in the deterministic model. The 
benefits are the actual benefits paid on the contract, and not the individual share of the total 
death benefits. As we stressed in Section 14.1, the discounting is with regard to the investment 
discount function only. The value will depend of course on when failure occurs, so „L will be 
arandom variable. It is a function of the shifted random variable T o r, since we have assumed 
survival up to time r. 

We next determine this function. Suppose that failure occurs at time r + t where k < t < 
k + 1, for some integer k. Then 7 o r will assume a value of k + 1. The contract will pay out a 
failure benefit of b, at time k + 1. Offsetting this, the net inflows would have been collected 
from time r to time r + k, so that in this case the value of „L will be 


byrr t ko 1) [r,+ avr r+ 1) 9 zaQyVG,r-K). 
In terms of random variables, we can write 


„L= bags avr Tor) -[z, + z qv, rt D mago, av(rart Tor- 1). 


Remark We have used the standard terminology here, but the reader should be aware 
that the word ‘loss’ in the above definition has a completely different connotation than that 
discussed in Section 6.4.1. It is not a definite number, measuring the loss brought about by 
deviation from the expected pattern of mortality or interest, but rather it is a random variable, 
giving the difference between what is collected and what is paid out under the contract at the 
different random times of failure. It should also be noted that „L is of the few random variables 
we encounter that can take negative values, which occur when there is a ‘gain’. 


Definition 15.2 In this stochastic model, we define the reserve at time r, denoted as before 
by „V, simply as 


rV = E(,L). 
So „V is the expected value of the prospective loss, which is the expected present value of 


the future benefits less the expected present value of the future inflows. It is clear that in the 
case of T — T(x) we obtain the same value as in the deterministic model. 
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Example 15.5 Let T = T(60). We are given ggg = 0.1, go) = 0.2, gon = 0.3, i = 100%. 
Suppose we have a 3-year term insurance policy on (60) with bọ = 200, b, = 200, b, = 
100, To = 20, A, = 20, T, = 10. 


(a) Find the distribution of gL. 
(b) Find the distribution of |L . 


(c) From your answer to (b), calculate , V. 


Solution. 
(a) The distribution of gL is as follows: 


k Value of ọL when T = k P(T =k 
1 200 x + — 20 = 80 0.100 
2 200 x 1 - (20+ 20x 1) = 20 0.180 
3 100x $ — (20 +20x ++ 10x 1) = -20 0.216 
>4 0- (20+20x 1 +10x 1) = -32.5 0.504 
(b) 

k Value of |L when Tol =k P(Tol=k) 
1 200 x + — 20 = 80 0.20 

2 100x 1 - (20+ 10x 1) =0 0.24 
>3 o- (20+ 10x 3) = -25 0.56 


(c) 1V = 80x 0.20 + (—25 x 0.56) = 2. The reader should verify this agrees with the 
answer that we would get from the methods of Chapter 6. 


15.6 The stochastic approach to premiums 


For this section only, it is convenient to adopt a different convention. Assume now that 
expenses and annuity payments are included as benefits, so that the inflows z, are just the 
premiums collected. 


15.6.1 The equivalence principle 


The random variable 9Z is usually denoted just as L. It is the present value of all future 
benefits less the present value of all future premiums. Recall that in the deterministic model 
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we calculated net premiums by setting the present value of benefits equal to the present value 
of premiums. The stochastic version of this concept is to set premiums so that 


E(L) = 0. 


This is known as the equivalence principle. 


15.6.2 Percentile premiums 


The stochastic viewpoint allows us to incorporate other features of the random variable L 
rather than just its expectation, when computing premiums. For example, it would be highly 
desirable for an insurer to avoid a positive loss. Of course, it is impossible to ensure that this 
will always hold. If the insured dies shortly after purchasing a life insurance policy, then L will 
almost certainly be positive. Suppose, however, we set a threshold a, and then set premiums 
as small as we can, so that the probability will be at least a that the loss will be nonpositive. 
Typically a will be a number close to 1 such as 0.95 or 0.99. These are sometimes referred to 
as percentile premiums. 

This sounds reasonable, but there are major problems to such an approach. In the first 
place, the method does not take into account the important right tail of the distribution, 
consisting of those values of L which are greater than the specified percentile. If there are 
very large claims present, the method may not provide sufficient premiums to cover the risk. 
This is evidenced by the fact that the method can produce premiums that are actually less than 
the equivalence principle premium. For an extreme example of this, suppose we have an n 
year term policy and the probability of (x) dying within n years is less than 1 — a. We could 
achieve the stated goal by charging premiums of 0, which is absurd. 

On the other hand, for high values of « the method can produce premiums that seem 
inordinately high when compared with the equivalence principle premium. See Example 15.7 
below. 

Still another instance of the pathological behaviour produced by percentile premiums is 
illustrated by the following example. 


Example 15.6 On a single premium contract, the present value of the benefits takes the 
value 0 with probability 0.94, 1 with probability 0.01 and 3 with probability 0.05. 


(a) Find the percentile premium for a — 0.95. 


(b) Suppose the insurer plans to sell 2 contracts. Find the smallest possible premium for 
each, such that there is a probability of at least 0.95 that the total premiums will cover 
the total benefits on both contracts. 


Solution. 
(a) This is obviously 1, which gives exactly a 0.95 probability of covering the benefits. 


(b) Using independence, direct calculation shows that the probability of the total present 
value of benefits being less than or equal to z is 0.9025 for z = 2 or 0.9965 for z = 3. 
The smallest total premium needed is then 3 resulting in a charge of 1.5 per policy. 
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The conclusion of this example is exactly contrary to what one would expect, and indicates 
a major flaw in the percentile premium approach. As the number of policies increases, the 
risk should decrease due to diversification. Unfavourable occurrences on one contract can 
be offset by favourable ones on another. The premium per policy should therefore decrease 
rather than increase. 

Despite the drawback inherent in percentile premiums, there may be times when one want 
to compute these, possibly for comparison with premiums produced by other methods. Here 
is a procedure for the case where benefits are paid at the end of the year of failure. We want 
to find the smallest possible level premium z payable for h years, such that the probability of 
a nonpositive L is greater than or equal to some given number a. We consider only the case 
where L decreases as the value of T increases. This is fairly typical. As the time of failure 
increases, the discount factor reduces the present value of the benefit paid. In addition, the 
later the occurrence of failure, the more premiums will be collected. Both of these factors 
tend to decrease L. (Of course, the required condition may not hold in the case where benefits 
are increasing rapidly.) The procedure is as follows: 


1. Let kj be the largest positive integer k such that s(k) > (a). 


2. Solve for z so that L is 0 when T = kọ + 1. That is, 
by, Y (ko ct 1) 2 zà(l,;v), 


where r is the minimum of kọ + 1 and A. Since L is decreasing with k, L < 0 implies that 
T > ky. Therefore, 


a € s(kg) € PL < 0). 

Moreover, if we take any smaller premium z’, then for T- ko + 1, L will be positive, so that 
P(L < 0) < 541 «a. 

by our choice of kọ. 

Example 15.7 Refer back to Example 15.5. Find the level percentile premium z payable 


for 3 years if a — 0.8. 


Solution. We take ky = 1 since P(T > 1) = 0.90 > 0.8, while P(T > 2) = 0.72 < 0.8. When 
f = 2, L = 200 x n -a(l + 2. so that z = 33.33. This is much higher than the equivalence 
principle level premium of 13.31. 


15.6.3 Aggregate premiums 


An analysis of the examples above indicate that the percentile premium method can give 
unreasonable results with highly asymmetric distributions, where there is a large probability 
of a value in one of the tails. This approach however is a reasonable one to use in order 
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to compute the total premiums for a large group of policies, where the distribution of total 
benefits is likely to be more heavily concentrated about the mean. Suppose the insurer issues 
several contracts of the same type and want to be fairly certain, that in the aggregate, the total 
premiums collected will cover the total benefits. There are two major questions that could 
be considered. First, the premiums may be fixed and we want to determine the number of 
contracts to sell to achieve the desired confidence. Second, the number of contracts may be 
fixed and the problem is to determine the premium to charge. In general, exact calculations 
are prohibitive, but we can get approximate answers by assuming that the totality of all losses 
has a normal distribution (see Section A.11.5). 

It is convenient to introduce the following quantity, which we will discuss in some 
generality. 


Definition 15.3 For a random variable X with a positive mean, the coefficient of variation of 
X is the quantity 


cvo- XX 


It is clear from (A.8) and (A.10) that for any constant c, 
CV(cX) = CV(X), 


showing that, unlike variance or standard deviation, this is a measure of variation that is 
independent of the particular units. So for example, if we measure loss in US dollars and then 
change the units to pounds sterling, the variance will be quite different, but the coefficient of 
variation will remain the same. 

There are some other facts about this quantity that we want to derive. Suppose S = 
X; +X, +- Xy where these are independent random variables each distributed as X. 
Then 


Var(S) = N Var(X), E(X) = NE(X), 
showing that 


CV(X) 


VN 


illustrating the expected diversification effect that occurs when we take an independent 
sum of random variables. We expect to reduce variation, as some high values are offset by 
low values. 

Next, suppose that X is a normal random variable with positive mean, and we want the 
probability that X takes a value greater than or equal to 0. This is clearly independent of 
any units, so we should be able to express it in terms of the coefficient of variation. Indeed, 
suppose X has mean y and standard deviation c, so that X = 4 + oZ where Z is the standard 


CV(S) = (15.17) 
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normal. Then, P(X > 0) = P(Z > —u/c) which by symmetry is equal to P(Z € u/o). We can 
write 


P(X > 0) = ®[CV(X)“1], (15.18) 


where © is the cumulative distribution function of the standard normal. 

Consider now the first question given above. Suppose we have a contract with a prospective 
loss at time 0 of L, where E(L) < 0 and we want to determine the smallest value of N, so 
that if N independent contracts are sold, the probability is at least a that premiums will cover 
benefits. 

Let S denotes the aggregate loss on all contracts and apply our results above to —S, the 
aggregate gain. We want a probability of a that —S will be positive. Assume that —S is normal. 
Invoke (15.18) with X = —S, and take b^! of each side, resulting in 


©! (æ) = CV(-S) 
Then apply (15.17) with X = —L to get ®!(a) = VN/CV(-L), so that 


VN = © !(a)CV(-L). (15.19) 


We see then that the required number of contracts increases as the confidence level goes up, 
and also as the uncertainty in L goes up, which is just what we expect to happen. 

For the second question above where N is fixed, consider a single premium contract with 
present value of benefits Z. Let the required premium be of the form (1 + 0)E (Z). The quantity 
0 is known as the relative security loading — the words risk loading and contingency loading 
are also used. (So if 0 — 0.20 for example, it means that the premium is calculated by adding 
20% to the equivalence principle premium.) Now in this case 


-L-(140)E(Z) - Z, 


so that Var(—L) = Var(Z), and E(—L) = 0E(Z). Therefore 


Z 
CV(-L) = 2a (15.20) 
Substituting from (15.19) we can write 
Sj D 
ge, 2 OVE), (1529 


VN 


This is certainly reasonable as it shows that the loading increases with the tolerance level, 
and the uncertainty in the benefits, but decreases as more contracts are sold. 


Example 15.8 Suppose the force of interest is a constant 0.06 and that y(t) = 0.04 for all 
t. An insurer sells 100 contracts, each of which pays 1000 upon failure. Assuming a normal 
approximation, how much should be charged as a single premium on each contract, so that 
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there is a 95% chance that premiums will be sufficient to cover the aggregate claims for this 
group of 100 contracts. 


Solution. If Z = the benefit for a contract paying one unit at failure. Then 


FOS =04,, Eye — 


= 0.25. 
u+ô +26 


The formula for the second moment reflects the fact that squaring the discount function is 
equivalent to doubling the force of interest. It follows that Var(Z) = 0.09 and CV(Z) = 0.75. 
From (15.21), with a = 0.95, 6 = 1.645(0.75)/10 = 0.123375. For each 100-unit contract the 
equivalence principle premium is 400 and so the premium charged should be 400(1.123375) = 
449.35. 


Example 15.9 Suppose the contract in Example 15.8 is sold with premiums paid con- 
tinuously for life at a level rate, where the security loading 0 = 0.10. Assuming a normal 
approximation, how many contracts must be sold, so there is a 9596 chance that total premi- 
ums will cover total benefits? 


Solution. Since our formulas are independent of units we can ignore the 1000 and consider a 
]-unit contract. The main problem is to determine CV(—L). This is not always so easy when 
we no longer have a single premium, but it can be done for level benefits and premiums. This is 
carried out in Chapter 16, and we refer the reader to formulas (16.5) and (16.6) with t = 0. The 
equivalence principle premium is 0.04, so the premium z = 0.044 and z/ó = 11/15. From 
formula (16.6) y Var(L) = (26/15))0.3 = 0.52. From formula (16.5) E(L) = (26/15)(0.04) — 
11/15 = —0.04. So 


CV-D) = qu EL. 


From (15.19) VN = 1.645 x 13, so N = 457.32. This means that 458 contracts must be sold. 


15.6.4 General premium principles 


There are many other possibilities for computing premiums to allow for the random nature of 
the loss. These are known in general as premium principles. Formally, a premium principle is 
a function H, which assigns to each random variable X representing a loss, a number H(X), 
which is the premium to be charged for accepting the risk X. This definition is applicable to 
contracts purchased by a single premiums, rather than periodic premiums. 

The equivalence principle premium is given by H(X) — E(X). The standard deviation 
principle is given by H(X) = E(X) + p y Var(X), for some positive f. Dividing this by E(X), 
we can see from (15.21) that the standard deviation principle can be interpreted (assuming a 
normal approximation) as setting premiums so that there is a certain probability, which will 
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depend on N, that premiums will cover losses. Another example is the variance principle given 
by E(X) + fVar(X), for some positive f. (Exercise 15.25 points out a flaw in this principle.) 
There is extensive treatment in the actuarial literature comparing the properties of these and 
other premium principles. We will not elaborate further here, but more comments can be 
found in Chapter 22. 


15.7 The variance of „L 


We have seen that in our stochastic model, the reserve represents the expected value of the 
amount that we need in order to meet future obligations. It is useful to have more information 
about the distribution of this random variable. In this section we present a formula for 
calculating the variance of „L under the setup of Section 14.2. Of course, if we have computed 
the exact distribution of „L, as we did in Example 15.5 we could calculate the variance 
directly. The present method will allow us to calculate the variance without knowing the 
exact distribution, as long as we know all the reserves after time r. Moreover, it gives us a 
decomposition of this variance into that attributable to each of the future years. In the actuarial 
literature, this result is known as Hattendorf’s theorem. 

We start by taking r = 0, so we compute the variance of L = gL. To simplify the task we 
look at the net cash outflow year by year. Let C, denote the value at time k of the net amount 
paid out in the year from k to k + 1. That is, C, is equal to the value at time k of the failure 
benefit paid out at the end of the year, less the inflow at time k. So C, is a random variable 
taking three possible values depending on whether failure has occurred before time k, during 
the year (k,k + 1) or at time k + 1 or later. If failure occurs before time k, nothing is paid 
and nothing is received. If failure occurs during the year (k,k + 1), then b, is paid at time 
k + 1 and z, is received at time k. If failure occurs after time k + 1, nothing is paid out and a 
premium of m, is received at time k. The following table summarizes the possible values of 
C, and the respective probabilities: 


Time of failure Value of C; Probability 

Before time k 0 ] — s(k) 

Between time k and k+1 v(k, k + Db, — z, f+) = s(K)ACk + 1) 

At time k + 1 or later =T s(k+ 1) = s(k)(1 — A(k + 1)) 


The random variable L is related to the values of C, by 


L= 2 v(k)C,. (15.22) 
k=0 


A fundamental property is that 


CC, = -Cp j«k. (15.23) 
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To verify (15.23), observe that it is obvious if C, = 0, since then both sides are 0. If C, is not 
equal to 0, it means that failure did not occur before time k and therefore must have occurred 
at time j + 1 or later. We see from the table above that C; is equal to —z;. 

We will first derive a variance formula for the case of a contract with zero reserves. This 
means that the premium paid at any time k is just enough to cover the death benefit paid at 


the end of that year, so that 
m =v(k,k + 1)b,A(k+ 1), forall k. (15.24) 


The significance of this case is that, as we show below, the random variables C, for different 
values of h are uncorrelated (although they are not independent). Moreover, we have a fairly 
easy expression for the variance of each C}, so that (15.22) gives us a formula for Var(L). 

Substituting from (15.24), we see that the second and third values of C, as listed above are 
respectively v(k,k + 1)b,[1 — A(k + 1)] and —v(k, k + 1)b,A(k + 1). It follows immediately 
that under assumption (15.24), 


E[C,] =0, forall k, (15.25) 
so that from (15.23), if j < k, 
Cov(C;, Cy) = —zjE(C,) = 0. (15.26) 
Moreover, after some algebraic manipulation, 
Var[C,] = E [C?] = vík, k + D? b2s(K)A( + IC — Ak + 1)). (15.27) 


Invoking the fact the variance of a sum of uncorrelated random variables is the sum of the 
variances, and noticing that s(k)(1 — A(k + 1)) = s(k + 1), we deduce from (15.22) that 


Var(L) = > v(k + D?b?s(k + 1)A(k + 1). 
k=0 


We next consider the case of a general duration r, still keeping the assumption of zero 
reserves. In this case, „L will just be 9L for the policy on (x + r) with the benefits and premiums 
starting at age r, and with probabilities conditioned on the survival of (x) to age x + r. The s 
and A in the formula above will be those associated with the distribution of 7 o r. The result, 
using (15.10) and the discrete counterpart of (15.8), is 


oo 


Var(,L) = ET F r,r + k+ 1P sr + k+ DAG + k+ 1). 
k=0 


Finally, we consider the general case where we remove the restriction given by (15.24). 
We split the policy into two parts, the risk portion and the savings portion, in exactly the 
same way as we did in Section 6.4.2. This leads to a corresponding split of the prospective 
loss into that portion of this loss attributable to the risk portion and that attributable to the 
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savings portion. But the prospective loss for the savings portion must be a constant, since it is 
completely independent of the mortality experience. (In fact this constant is just the present 
value, with respect to interest, of the pure endowment paid out at the end, less the savings 
portions of the premiums. Since the fund is reduced to zero, this present value must be what 
you started with, which is ,.V.) The means that the variance of the prospective loss is the same 
as the variance of the prospective loss on the risk portion. Therefore, the only change from 
the previous formula is to replace the original death benefits by the benefits attributable to the 
risk portion, namely the net amount at risk. Our final formula is 


[oe] 


Var(,L) = FT vr, rk Vn, srt kt DA Kk D. (15.28)t 
k-0 


This leads to a backwards recursion formula. Splitting off the first term, 


s(r + 1) 
scr) 


Var(,L) = v(r, r + 1 [2 Ar + 1) + Var(,4;L)].- (15.29) 


This is useful if T is bounded, as is true with T(x). We can start the recursion with a value of 
0 for the variance of the loss at the final duration. 


Example 15.10 Refer back to Example 15.5. Use (14.23) to find the variance of ,L. 


Solution. We will first calculate the product of the three mortality factors in (14.23). In the 
case where T = T(x), this product for index k is 1, 1p,,, q,,,44. For k = 0 we get 0.8 x 0.2, 
and for k = 1 we get 0.8 x 0.7 x 0.3. We also calculate that ,V = 5, so that 47; = 195. Then 


Var(,L) = I x 195? x 0.16 + = x 100? x 0.168 = 1626. 


We can verify this answer directly from the solution to Example 15.5, where we have the 
complete distribution of ,L. 


15.8 Standard notation and terminology 


The only major item we have not yet discussed is the use of an upper left subscript 2 to denote 
a quantity that is computed at a force of interest that is double that of the standard one or, 
equivalently, at a discount function that is the square of the standard one. This was developed 
for variance formulas. Its use 1s, however, restricted to cases where the failure benefits are 
either O or 1. In that case, squaring the benefits leaves them unchanged, and the second 
moments are calculated by just squaring the discount function. For example, the variance 
for the benefits paid on a 1-unit, end-of-the-year-of-death, n-year term insurance would be 
written as 
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Notes and references 


To be precise, one should specify what convergence means in Equation (15.8) in the case 
that the sum is infinite. We did not worry about that here, since the important result for our 
purposes is (15.9) which, as we noted, is valid in almost all cases. 

Young (2004) discusses a variety of premium principles and their properties. 


Exercises 


Type A exercises 


15.1 


15.2 


15.4 


15.6 


A failure time T is uniformly distributed on the interval [0,10]. An insurance contract 
pays e099' at the moment of failure if this occurs at time t. The force of interest is a 
constant 0.04. Find the expectation and variance of the present value of the benefits. 


A 3-year life annuity on (x) and (y) provides for annuity benefits of 1 at time 0, 2 at 
time | and 4 at time 2, provided that both individuals are alive. You are given that 
Px = 0.8, 5p, = 0.6, p, = 0.7, 2p, = 0.5 and the interest rate is a constant 100%. 


(a) Find the expectation and variance of the present value of the benefits. 


(b) A single premium for the contract is found by adding a 20% relative risk loading 
to the present value. What is the probability that the premium will be sufficient to 
provide for the benefit payments on a single contract? 


A life annuity contract provides for continuous payments at the annual rate of e~°-9* 


at time f. The force of mortality is a constant 0.1, and the force of interest is a constant 
0.05. If Y is the present value of the benefits, find E(Y) and Var(Y). 


A 10-year term insurance policy provides for a payment of 10 — f at the moment of 
the first death of (x) and (y), should this occur at time t. The mortality of (x) and (y) 
is as given in Exercise 14.9. Assuming an interest rate of 0, find the expectation and 
variance of the present value of the benefits. 


A failure time T is uniformly distributed on the interval [0,10]. There is a constant 
interest rate of zero. An annuity provides for continuous payments at the rate of t at 
time f, provided that failure has not yet occurred. Find the expectation and variance 
of the present value of the benefits. 


A 4-year pure endowment contract on (60) provides for 1000 paid at age 64 if (60) is 
then alive. Nothing is paid if death occurs before age 64. This is purchased by four level 
annual premiums of 100. You are given that q6ọ = 0.1, dg, = 0.2, qg; = 0.25, G63 = 
0.30, and the interest rate i is a constant 25%. Write down the exact distribution of , L. 
Use this to find ,V. 


A certain product has a failure time of T. The approximating discrete random variable 
T has a constant hazard function A(k) = 0.3. (In other words, there is a 30% chance 
that the product will fail each year, given that it was still working at the beginning 
of the year.) An insurer agrees to pay 100 at the end of the year of failure should the 
product fail within 2 years. In return it collects a premium of 40 now and a second 


15.9 


15.10 


15.11 


15.12 
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premium of 20 at time 1, if failure did not occur in the first year. The interest rate 
is 25%. 


(a) Find E(L), Var(L), Var(,L). 


(b) Find the probability that, for a given contract, the premiums collected will be 
sufficient to pay the benefits. 


A 3-year endowment insurance on (70) provides death benefits of 100 payable at the 
end of the year of death, if this occurs within 3 years, plus a pure endowment of 100 at 
age 63 if the insured is then alive. This is purchased by two premiums of 30 payable 
at age 60 and 61. You are given that qgy = 0.2, gg, = 0.4, and the interest rate is a 
constant 25%. Give the complete distribution of the random variable L and use this to 
find oV. 


An insurer charges a single premium of E(X) + 0.14/ Var(X) for a contract with present 
value of benefits equal to X. Use a normal approximation. 


(a) How many contracts must be sold so that the probability is 95% that premiums 
cover benefits? 


(b) What is the probability that premiums cover benefits if 225 contracts are sold? 


A failure time has a hazard rate of 


2 
t) = ——. 
Hn] 
A contract provides for a benefit of 1 at time 10 provided that failure occurs before 
time 10 (so, for example, if failure occurs at time 6, the benefit is not paid until 4 
years later). You are given that v(10) — 0.6. Find the expectation and variance of the 
present value of the benefits. 


Consider a 3-year life annuity on (60). The sequence of payments (beginning at time 
0) is 4, 2, 1. You are given that qęọ = 0.2, qe, = 0.4, qg, = 0.5. The interest rate 
is a constant 25%. The insurer sells 100 such contracts, charging (1 + r) times the 
equivalence principle single premium. What should r be so that there is at least a 9596 
chance that premiums will cover the benefits? (Use a normal approximation.) 


A 2-year term insurance contract sold to two lives, (70) and (71), provides for benefits 
payable at the end of the year of the first death. The benefit paid is 40 in the first year 
and 50 in the second year. You are given that q;; = 0.4, g7; = 0.5, q72 = 0.6, and the 
interest rate is 100%. Let Z denotes the present value of the benefits. 


(a) Find E(Z) and Var(Z). 


(b) Suppose the contract is to be paid for by level annual premiums of P payable for 
2 years. What should P be if the equivalence principle is used, and what is the 
resulting probability that L « 0? 


(c) What is the smallest amount that P could be if we want a probability of at least 
0.25 that the loss L will be less than or equal to 0? 
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15.13 


15.14 
15.15 


15.16 


15.17 
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A 3-year endowment insurance on (70) provides death benefits of 400 payable at the 
end of the year of death, if this occurs within 3 years, plus a pure endowment of 200 
at age 73, if the insured is then alive. This is purchased by three level premiums of 40. 
You are given that q79 = 0.2, q;; = 0.4, q} = 0.5, and the interest rate is a constant 
100%. 


(a) Give the complete distributions for the random variable L, and the random variable 
,L. Use the latter to find the reserve at time 1. 


(b) What should the level premium be if the insurer wants the smallest premium such 
that the probability of a positive loss is less than 2596? 


Use Equation (15.12) to calculate the variance in Example 15.3. 


You are given that q, = 0.1, q,,, 2 02, 4, — 02,  qy,4 = 0.3, and the rate of 
interest is a constant 2596. Find the variance of the present value of the benefits in 
each of the following contracts 


(a) A payment of 100 is made at the end of the year of the second death, provided 
this occurs within 2 years. 


(b) Three annual payments of 100 are made, the first at time 0, provided that at least 
one of (x) or (y) are alive. 


A failure time T has a uniform distribution on [0,30]. The force of interest is a constant 
5%. A single premium insurance contract pays | unit at the moment of failure. It is 
desired to have at least a 9596 probability that premiums will cover benefits on a large 
group of contracts. Answer the following assuming a normal approximation. 


(a) Suppose the security loading is 20%. How many contracts should be sold? 


(b) It is estimated that 100 contracts will be sold. What should the security loading 
be? 


Redo Example 15.2(b) only now assuming that the interest rate is 2596 in the first year 
and 50% in the second year. 


Type B exercises 


15.18 


15.19 


A failure time has a constant hazard rate of and the force of interest is a constant 
ô. A contract pays e?” at the moment of failure for failure at time t, where y < w+ ô. 
Find the expectation and variance of the present value of the benefits as a function of 
H, ô and y. 


Suppose that the force of mortality u is a constant 0.04 and the force of interest is a 
constant 0.06. 


(a) An insurer sells to one individual a single-premium, whole-life policy, with 100 
payable at the moment of death, and also sells, to an independent life, a life 
annuity with benefits payable continuously at the rate of c per year. If W denotes 
the present value of the benefits on the two contracts combined, find Var(W) as a 
function of c. 
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15.21 


15.22 
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(b) Suppose now that the two contracts in (a) are sold to the same individual. What is 
the variance of W in this case? Is it less than, greater than or equal to the variance 
in (a)? Explain why. For what values of c, if any, will the variance be equal to 0? 
Explain. 


A failure time T has a p.d.f. of f(r) = (12 — 0/72 for O < t < 12. An annuity contract 
provides for continuous payments at the annual rate of 1, which stop at failure, or at 
time 6 if earlier. The interest rate is 0. Let Y denote the present value of the benefits. 


(a) Express Y as a function of T. 
(b) Calculate E(Y) and Var(Y). 


(c) The single premium charged for this contract is the expected value plus 20%. 
What is the probability that the premium will cover the benefits? 


Fora certain failure time T, an insurance contract pays benefits at the end of the year of 
failure. You are given that s;(9) = 0.8, s7(10) = 0.7 E(jgL) = 70, Var(j9L) = 1000. 
The amount payable at time 10 for failure between time 9 and time 10 is 100. The 
interest rate is a constant 20 %. Find Var(9L). 


A discrete failure time T has a constant hazard rate of A(k) = 1/2. There is a constant 
interest rate of 100%. An insurance contract pays a benefit of 1 unit at the end 
of the year of failure. Level annual equivalence principle premiums are payable 
prior to failure. Using formula (15.28), show that for any nonnegative integer r, 
Var(,L) = 1/15. 


Consider two independent lives (x), (y) , where T(x) is uniform on [0,1] and T(y) is 
uniform on [0,2]. 


(a) Calculate the p.d.f. of the last survivor failure time T(xy). 


(b) An annuity provides continuous payments for as long as either (x) or (y) is alive. 
The rate of payment at time t is 322. The rate of interest is zero. If Y denotes the 
present value of the benefits for this annuity, find E(Y) and Var(Y). 


An endowment insurance on (x) provides for 1000 at the end of the year of death, 
provided this occurs within 4 years, and 1000 at time 4 if (x) is still alive. Net 
level annual premiums are payable for 4 years. You are given that q, = 0.05,q,,, = 
0.08, q,.,» = 0.10. Interest rates are 5% for the first 2 years and 6% thereafter. Find 
Var(,L) in two ways: first, by finding the exact distribution of L; and second, by 
finding V and 3V and then using Equation (15.28). 


A failure time takes the values 1 or 2 each with probability 1/2. The interest rate is O. 
Insurance contract A pays 5 at failure. Insurance contract B pays 1 if failure occurs 
at time 1, and 5 if failure occurs at time 2. Show that using the variance premium 
principle with f = 1, the premium for contract B is higher than that for contract A. 


Spreadsheet exercise 


15.26 


Modify the spreadsheet of Chapter 6 to calculate Var(,L). 


16 


Simplifications under level 
benefit contracts 


16.1 Introduction 


The calculation of variances and other distributional features simplifies considerably when 
we have level benefits and constant interest. In fact, we can write down formulas for exact 
distributions of the major random variables of interest. Throughout this chapter, we consider 
the following setup. We have a general failure time T. We will consider insurances paying 
a level amount upon failure, either at the end of the year of failure or at the moment of 
such, and we will consider annuities paid prior to the failure of T' with either a level pay- 
ment or continuous payments at a level rate. In addition, we assume a constant force of 
interest 6. 

By taking T to be T(x), this will apply to level benefit, whole-life insurances and to level 
benefit whole-life annuities. By taking T = min(7(x), n], this will apply to level benefit n- 
year endowment policies and to level benefit n-year temporary life annuities. Our assumption 
does not apply to term insurance, even when there is a level benefit during the term, since 
the benefit drops to zero after the expiration of the contract. However, in Section 16.5 we do 
illustrate that the calculation of exact distributions is possible for term or deferred insurances 
with a level death benefit paid over the benefit period. 


16.2 "Variance calculations in the continuous case 


It is convenient to begin with a continuous failure time T. 
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16.2.1 Insurances 


Consider an insurance policy paying 1 at the moment of failure. The discount function is 
given by v(t) = v = e7?'. Let Z be the present value of the benefits. In this case there is little 
simplification, and we know from the previous chapter that 


Ar = E(Z) = Ev"), Var(Z) = E(v/7) - (A5y^. (16.1) 


16.2.20 Annuities 


Consider an annuity with continuous payments at the rate of 1 per year, made prior to the 
occurrence of failure. If Y is the present value of the benefits, then 


t pes! qz 
Y-a(j)- = =. (16.2) 


n _1—E(Z) Var?) = n 


(16.3)} 


For the case where T = min{T(x),n}, we have already seen the first part of (16.3). This 
was the continuous ‘endowment identity’ given at the end of Section 8.8. 


16.2.3 Prospective losses 
Consider a contract that pays | unit at the moment of failure and has continuous level premiums 
at an annual rate of z payable prior to failure. (As a practical matter, this means that we are 
dealing with a net premium model that ignores expenses, which are unlikely to be level.) The 
prospective loss at time f is 
L2 Zot- z(Yor) (16.4) 
where Zot = v?! andY ot = (1 — Ž o t)/ô. This leads to 
Tz Tot T 
L- (142) ot, 16.5 
t 5 V 5 ( ) 


This greatly simplifies the calculation of the variance of the prospective loss. Compare the 
following with (14.23). 


Var(,L) = (1 + zy Var(v? *^), (16.6 


16.2.4 Using equivalence principle premiums 


Suppose that z is an equivalence principle premium. Then zE(Y) = E(Z), so that from (16.3), 


g-—-6. (16.7) 
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Taking t = 0 in (16.6), 


Var(v') | Var(Z) 


DIT Wap? T= AE 


(16.8) 


showing that 
Var(Z) < Var (L). 


This reflects the fact that there is more risk involved in selling an insurance contract where 
premiums are payable over the entire life of the contract as opposed to the single-premium 
case. For failure occurring early, the insurer not only loses interest but will have collected 
relatively small amounts in premiums. 

For a final formula, express Z o t in terms of Y ot in (16.4). Then 


L= (1 — (x + 8YY ot) 


If z is an equivalence principle premium, we can substitute from (16.7) and take expectations 
to give a simple formula for the reserve at time f. 


Vei i (16.9) 


ar 


(For T(x), we obtained the discrete version of this in (6.18).) 


16.3 Variance calculations in the discrete case 


We now consider the case where failure benefits are paid at the end of the year of failure, and 
annuity benefits and premiums are payable yearly. All of the formulas in Section 16.2 have 
discrete counterparts, which for the most part are obtained by replacing T by T, A by A, a 
by à and ó by d. We will leave the formal derivations to the reader, but will list the formulas 
with the corresponding equation numbers as in Section 16.2, only with a prime to denote the 
discrete case. 

If Z denotes the present value of the benefits for an insurance paying 1 unit at the end of 
the year of failure, 


A; = E(Z) = EW"), Va(Z) = Ev") - (Ap. (16.1^) 


If Y denotes the present value of an annuity paying 1 unit yearly provided that failure has 
not occurred, 


: bey dez 
Y-à- r ETE. (16.2^)1 
ig [EU A Vans E 


(16.3) 
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Now consider a contract which pays 1 unit at the end of the year of failure and has level 
annual premiums of z payable prior to failure. Then for any positive integer k, 


„L= (1e 5-5, (16.5) 
2 > 
Var(,L) = (1 H =) Var(v? 95, (16.6) 
Suppose that z is an equivalence principle premium. Then 
1 / 
mE (16.7^) 
aF 
T T 

va 2 ONU (16.8/) 

(dàyp | (1-Agp* 
Meee. (16.9/) 


agp 


One could possibly consider expenses when using the above simplified prospective loss 
formulas if the difference is only in the first year, as the following example shows. 


Example 16.1 An insurance policy provides 1000 at the end of the year of death with level 
premiums payable for life. There are expenses in the first year of 6096 of the premium plus 
50, and in the subsequent years of 15% of the premium plus 20. In addition, there is a death 
benefit settlement expense of 50. You are given that E(Z) = 0.4 and Var(Z) = 0.10 where Z is 
the present value of 1 unit of death benefit. The rate of discount is a constant 0.06. Find Var 
(L) where L is calculated using expense-augmented premiums, and including all expenses. 


Solution. We first compute the expense-augmented premium G as 
Gà, = 0.45G + 0.15Gà, + 30 + 204, + 10504, 


so that 


_ 1050A, + 30 + 204, 
~ 85a, — 0.45 


We know that à, = (1 —A,)/d = (1 — 0.4)/6 = 10, and substituting in the above, G = 
80.75. This means that the total inflow after the first year is 0.85(80.75) — 20 = 48.64. 

Now consider a policy where the total inflow was 48.64 in every year. The value of L from 
that contract would differ from that on the one in question only by a constant amount, namely 
the extra amount in the first year due to higher expenses. Therefore, the variance of L would 
be the same, and we can use formula (16.6’), with z = 48.64, and the 1 replaced by the death 
benefit of 1050. From (16.6’), 


2 
Var(L) = (1050 + Ba) 0.10 = 346 208. 
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16.4 Exact distributions 


In this section, we calculate the exact distributions for Z, Y and L. In each case, we will derive 
distribution functions. These will be given for values between the greatest lower bound and 
least upper bound of the values. (We know that F takes the value 0 for arguments less than the 
greatest lower bound , and 1 for arguments greater than the least upper bound.) It is convenient 
here to introduce some new notation. For any random variable X, let 


FyQ) =P(X <x), Sy) = P(X > x). 


Of course, when X is continuous, F=Fand3=s. 

The distribution functions we want are all easily expressed in terms of the distribution 
of T. Let N denote the least upper bound of the values of T. In the case that N = oo (as for 
example when T is exponential), the term v in the formulas below will be equal to 0. 


16.4.1 The distribution of Z 


The distribution of Z is given by 


| | 
Fo(a)= Pe? <2) = P(T2 M8) =, (-=), Wez<l. (16.10) 


16.4.2 The distribution of Y 


First note that arguing as in (16.10) gives 


Pole) s, (- wee) l 
Using (16.2) 
Fj() = PZ = 1- ôy) = 1- Fy (1 — ôy). 


Substituting from above, 


1-2 
2-. 


_log(1 - 2») Lp ( 2 — ôy) (16.11) 


TOS < 


16.4.3 The distribution of L 


The minimum value of L will be 
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occurring for failure at time N, and its maximum value will be 1, occurring for failure at time 
0. Using (16.5), 


F,(u) = P (1 $ 2 Jes] «ut 
moles (#22) vn 
ee eR). eere 


For the more general case, F, 10 is given by the same formula, but with $7, replacing $7 
and v~ replacing v". 


16.4.4 The case where T is exponentially distributed 


In the particular case where T is exponential with constant hazard function u, we know that 
Sr(z) = S7(z) = e™™ and N = oo. The above formulas simplify to 


Fz@ =z,  0«z«l1, (16.13) 
Fy) =1-(1-6y"8,  0<y< L, (16.14) 
Su + a VI m 
= —— €u & l. y 
Fi (u) (==) , 5 sus! (16.15) 


It is of interest to observe that if u = ô in the exponential case, the exponent in the formulas 
above equals 1 so that Z, Y and L are all uniform random variables. 


Example 16.2 A company decides to add 20% to its equivalence principle premiums 
as a protection against unfavourable experience. In each of the following cases, find the 
probability that premiums will cover claims. Suppose that T is exponential with u = 0.04 and 
that ó — 0.06. 


(a) A single-premium annuity providing continuous payments at the annual rate of | prior 
to failure. 


(b) A contract paying 1 unit at failure, with level premiums payable continuously prior to 
failure. 


Solution. (a) The equivalence principle premium is 1/(u + ô) = 10. The actual premium 
charged will be 12. From (16.14), 


P(Yx12)21—028)7? = 0.57. 
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(b) The equivalence principle annual premium rate is just y = 0.04, so the actual premium 
rate charged is 0.048. The probability that premiums cover claims is just 


0.048 y^ ose 


P(Lx0)- (zs 


16.5 Some non-level benefit examples 


It is also possible to obtain exact distributions in some simple cases involving non-level 
benefits such as term or deferred insurance. 


16.5.1 Term insurance 


Consider a contract that pays 1 at failure, provided failure occurs within n years. In order to 
handle failure times that are not continuous, we adopt the convention that a benefit is paid for 
failure at exact time n. We will compute the distribution of Z, the present value of the benefits. 
The minimum positive value of Z is e^?" occurring for death at time n. Since nothing is paid 
for death strictly after time n, which occurs with probability s7(r), we can easily obtain the 
term distribution from the whole life distribution. Pick up the probability mass to the left of 
z = e^?" and set it down as a point mass at the point 0, as illustrated in Figure 16.1. From this, 
utilizing formula (16.10), we can read off this distribution of Z. 


s), Ü reg, 


Fz(2-741,. /-logz 
ST 5 


(16.16) 


| prt eo, 


Example 16.3 Redo part (b) of Example 16.2, but now assuming n-year term insurance for 
n = 15 and 10. Level premiums are payable continuously for n years. 


Solution. The equivalence principle premium rate is still u = 0.04 so the actual premium rate 
charged is still 0.048 and z/ó = 0.8. For the contract in Example 16.2, the value of L when 
T = n is 1.8e 096 — 0.8. For n = 15, this is negative. This means that L becomes negative 
at some point prior to expiration of the term contract, so the probability that premiums cover 
claims is 0.58, exactly the same as it was in constant benefit case. When n = 10, the value of 
L for the contract in Example 16.1 is positive. Therefore, in order that L in this example be 
negative, it is necessary that T 2 10, so that no benefits are paid. The probability of this is 
g-9 52067. 


16.5.2 Deferred insurance 


A similar example is provided by deferred insurance. Consider a contract that pays 1 at failure, 
provided failure occurs after time n. We now adopt the convention that nothing is paid for 
failure at exact time n. In view of this convention, there need not be a maximum value of Z 
but e^?" is certainly an upper bound, since any benefits will be paid at a time later than n. 


SOME NON-LEVEL BENEFIT EXAMPLES 255 


Whole life | 
S, (n) Fr(n) 
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Figure 16.1 Graph of f(z) for various types of insurance 


Refer again to Figure 16.1. Since nothing is paid for death at or before time n, which 
occurs with probability F-(n), the probability mass to the right of e^?" is picked up and set 
down as a point mass at 0. 

Utilizing (16.10), we can write 


-1 
Fz() = Fr) + 8; ( se) Vege, 
1 e? «z«]. 


16.5.3 Anannual premium policy 


We next investigate a more complicated case where we compute the exact distribution of L 
in an annual premium policy. Consider a failure time T that is unbounded (i.e. N = co). We 
consider insurances which have premiums payable continuously at a level annual rate z for 
the duration of the contract. We will compare the distribution of L for a contract that pays 
1 unit on failure or at time n if earlier, and a contract that pays 1 unit at failure provided this 
occurs within n years. These then correspond respectively to endowment and term insurance. 
To simplify the notation, let 


m m m m 
by = - (145). c= et ey. 


In both cases, the value of L is given by (16.5) (with t = 0), provided that T takes a value less 
than n, which corresponds to L taking a value greater than bg. So for any interval (a, b) with 
a > bo, we have that P(a < L < b) is the same in both cases and this value can be calculated 
directly from (16.12). 
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Sr) Fr(n) Whole life 


EL ——À—3À 
-n/Ô co bo 0 


Sr(n) Sr(n) 
n-Year | n-Year 
term Fr(n) Fr(n) endowment 
i j ; i » } l | > u 
-n/Ô Co bo O -n/Ô co bo 0 


Figure 16.2 Graph of f; (u) for various types of insurance 


Consider the remaining probability of s;(n). For the endowment contract, this will all be 
concentrated at the single point bg. For the term contract, it will all be concentrated at the 
single point cg, which lies in the interval (—7:/6, bọ). As n increases to oo both by and co 
approach —z /6 and the distribution approaches the whole-life case as given in (16.12). 

The various density functions are compared in Figure 16.2. The probability mass to left 
of the point bọ on the whole-life graph is picked up and set down as a point mass at bg for the 
endowment policy, or at c for the term policy. 


Exercises 


Type A exercises 


16.1 


16.2 


16.3 


For a certain failure time T, an insurance contract pays 1 at the moment of failure, and 
has level premiums payable continuously prior to failure at the annual rate of 0.06. 
You are given that 


Var(Z) = 0.064, ^ Var(Y)- 10, 


where Z is the present value of the failure benefits, and Y is the present value of an 
annuity contract with continuous payments at the annual rate of 1, payable prior to 
failure. Find Var(L). 


For a certain failure time T, an insurance contract pays 1 at the moment of failure. 
Level equivalence principle premiums are payable prior to failure. If E(Z) — 0.6 and 
Var(L) = 2, find Var(Z). 


A whole-life insurance contract provides for 1 unit payable at the moment of death. 
Level premiums are payable continuously for life at the annual rate of 0.08. The 
force of mortality is a constant 0.05, and the force of interest is a constant 0.10. 
How many contracts must be sold in order that there is at least a 9596 chance that 
the total premiums on all these contracts will cover the total benefits? Use a normal 
approximation. 


16.4 
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An insurer sells 100 whole-life insurance policies each providing for 1 unit payable at 
the moment of death. Level premiums are payable continuously for life. The force of 
interest is a constant 0.06, and the force of mortality is a constant 0.04. What should 
the rate of premium payment be on each policy, in order that there is a 9596 chance 
that total premiums on all 100 policies will cover the total benefits on all policies? 
Use a normal approximation. 


Type B exercises 


16.5 


16.6 


16.7 


16.8 


16.9 


(See Exercise 13.8.) A continuous failure time has a density function, fr(t) = [fte P 
(a gamma distribution with first parameter 2). The force of interest is a constant ó. 
Find expressions in terms of f and 6 for: (a) Ay; (b) ar; (c) the net annual rate of 
premium payment when premiums are payable continuously prior to failure, for an 
insurance paying 1 at failure and (d) the reserve at time k for the contract in (c). 


An insurance contract, based on the failure time T, pays 1 unit at the moment of 
failure provided this occurs within 5 years. Nothing is paid for failure after that time. 
The force of interest is a constant 0.1. If T has the hazard function 


2 
t = ——, 
Ur (0) 107 
find the probability that the present value of the benefits is strictly positive, but less 
than or equal to e~?. 


Suppose T is uniform on [0, 10], and 6 = 0.05. A contract pays 1 unit at failure. Level 
premiums are payable continuously for 10 years. The premiums charged are the net 
premiums plus 20%. 


(a) Find the probability that L < 0. 


(b) Suppose now that, instead of 1.2 times equivalence principle premiums, the com- 
pany charges premiums at the rate of 0.05 per year. What is the probability that 
L<0? 


A failure time T is uniformly distributed on the interval [0, 20]. The force of interest 
is a constant 0.05. 


(a) A deferred insurance contract provides for a payment of | at the moment of failure 
provided that this occurs after time 5. If Z is the present value of the benefits, find 
the 80th percentile of Z. That is, find the point z such that P(Z < z) = 0.80. 


(b) Find the 80th percentile of Z when the contract is a term insurance policy that 
pays 1 unit if failure occurs in the first 5 years. 


Consider a deferred term insurance contract. It provides for 1 unit payable at the 
moment of failure provided that this occurs between N and 2N years from now. 
Nothing is paid if failure occurs in the first N years or after 2N years. You are given 
that the force of failure y and the force of interest 6 are constants such that y = 6/2. 
Moreover, you are given that e^"? = 0.36. If Z is the present value of the benefits, 
calculate F7(z) for all nonnegative values of z. 
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16.10 


16.11 


16.12 


16.13 


16.14 
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An insurance contract provides for e?' payable at failure, should this occur at time t. 
Net level premiums are payable continuously prior to failure. The force of interest is 
a constant ó. Show that 


An insurance policy provides a benefit of 1 at the moment of failure provided this 
occurs after 10 years. The hazard rate of the failure time is a constant 0.10 and the 
force of interest is a constant 0.05. If Z is the present value of the benefits, find the 
probability that: (a) 0 < Z € 0.5, (b)0 < Z < 0.5 and (c) < Z € 0.7. 


An insurance policy provides a benefit of 1 at the moment of failure plus a pure 
endowment of 1 at time 10. This is purchased by level premiums payable continuously 
for 10 years. The premium is the net premium plus 1096. There is a constant force or 
mortality of 0.05 and a constant force of interest of 0.05. Find (a) P(—1/3 < L € 1/2) 
and (b) P(-1/A4 < L € 1/2). 


Let T be any failure time, and assume constant interest. 


(a) Show that 
Ead P = (ay =a), 


where the superscript 2 on the right-hand side indicates calculation at a force of 
interest equal to 26 or equivalently with v replaced by v’. 


(b) Is the following formula true? If not, give a correct version. 
D RR E 
EIS rae 


Modify the formula in Section 16.5.2 in the case that there is a benefit paid for failure 
at exact time n. 


17 


The minimum failure time 


17.1 Introduction 


Suppose that T}, T5, ... , T,, are failure times defined on the same sample space. In this chapter, 
we investigate the random variable 


T = min(T,, T», ..., Tm} 


In other words, T is the time of the first failure to occur among the m different failure times that 
are possible. In particular, this will recast the material of Chapters 10 and 11 into a stochastic 
framework, and show that both joint-life theory and multiple-decrement theory are special 
cases of this general problem. In addition, it will provide more rigorous arguments for some 
of the results of those chapters that were obtained in an intuitive fashion. Finally it will deal 
with the important cases where the failure times need not be independent. 

In the joint-life case, where we have a group of m lives numbered 1, 2, ... , m, we can take 
T; to be the future lifetime of the ith life, so that T is the failure time of the joint m-life status. 
In the multiple-decrement context, we can take T; to be the time of failure from cause i in 
the associated single-decrement setting, so it is the failure time of the ith cause, assuming no 
other causes of failure are operating. Then, the random variable T is the time of failure in the 
multiple-decrement model. (In the machine analogy of Section 11.6, T; would be the failure 
time of the ith part.) 


17.2 Joint distributions 


We wish to expand somewhat on our brief description for the case m — 2 given in Appendix A. 
Suppose that each T; is continuous. There are various ways of describing the joint distribution. 
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We can do so by the joint density function fr, 7, 
joint distribution function 


T 3.05, ---> Ém), or alternatively by the 


vecim 


Py "A r, t2 s stn) = PIT: < ti, T» < bz, “ae Sn < tml 


ST T3 EN T, t2 ae is) = PIT, > ti T, > t5, wae Py eS > ty. 


To simplify the notation, we will often omit the subscripts and just write f, F or s when no 
confusion arises. 

The reader is cautioned that s(t), tz, ... bm) #1 — F(tj, t5, ..., tn) when m > 1. 

As in the one-dimensional case, we integrate to obtain the distribution or survival functions 
from the density function, and differentiate to go in the other direction. We must, however, 
use multiple integrals and partial derivatives. For example, with m = 3, 


f t2 t3 
F(t), 15,14) = / | i f(u, v, w)dw dv du, 
0 0 0 
S(t}, 5,13) = y P J f(u, v, w)dw dv du. 
ti h t3 


Now differentiate the latter expression with respect to t;. The fundamental theorem of calculus 
tells us to replace the variable u in the integrand with 7; and affix a minus sign since f; is a 
lower limit. The integrand consists of the second two integrals. The result is 


and similarly 


nitus) m — / / ft, v, w)dwdv (17.1) 
ti t It, 


After two more iterations of the procedure, we have 


ð ð ð 
ty, b, 14) = (-1)° ty t,t 
fttt) = (- dumm 9r 1 fs. 
Similarly, we can derive 
o ð ð 
ti, b, t sto, 
F(t, t,t) = 9i dE Ob F(t), ty, t3). 


Analogous expressions hold for general m, which appears as the exponent of —1 in the first 
formula. 

Note that the individual distributions are easily obtained from the the joint distribution or 
survival functions by 


Sp) = Spx, m E O 5,04. 0), 


Fr(t) = Fr T, Tn (00, 00,...,f,..., 00), 


Seay 


where the f is in the ith position, and oo indicates that you take limits. 
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17.3 The distribution of T 
17.3.1 The general case 


It is a simple matter to deduce the distribution of T from the joint distribution. Clearly, the 
minimum will take a value greater than ¢ if and only if each T; takes a value greater than t. 
Therefore, 


sp(t) = s(t,t,...,0). (17.2) 


(The reader should note that F7.(t) # F(t, t, ... ,£).) 


17.3.2 The independent case 


Let the density function, survival function, and hazard rate function of T; be denoted respec- 
tively by f;, s;, uj. Things become much easier to deal with when the random variables T; are 
independent. We can then readily write down the relevant functions for T in terms of the 
corresponding functions for 7;. For example, from (17.2) we obtain 


sp(t) = s, (so) ++ Sp O. (17.3) 


By taking logs and differentiating, we can find a similar relationship involving hazard func- 
tions. 


up) = ui). (17.4) 


i-i 


We have already encountered a particular case of (17.4) in (10.12). 
Example 17.1 Suppose that T, and T, are independent and both have the hazard function 
2 
t) = ——, Oxt«l. 
ROS 


Find the probability that the minimum value of these two random variables will be less than 
or equal to 1/2. 


Solution. We note that s(t) = (1— t^, for O0 € t « 1, either by first noting that, for i — 
1,2, (t) = (1— t)? or by noting that Jr (f) = 4/(1 — t). The desired probability is F7(1/2) = 
1 — s7(1/2) = 15/16. 


17.4 The joint distribution of (T, J) 


17.4.1 The distribution function for (T, J) 


Suppose that there is zero probability of the simultaneous occurrence of two or more failure 
times T1, T5, ..., T,,. We can then define the random variable J as the index of the random 
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variable giving the minimum. For an example with m = 3, suppose T, = 7, T; = 5, and 
T4 = 10. Then T would take the value 5, and J would take the value 2, since the minimum 
occurs for T5. 

In many applications, we are interested in the joint distribution of T and J. This can be 
described in various ways. One method is by the joint distribution function, 


Fy (tj) = P(T X t and J = j). 


This joint distribution has the somewhat unusual feature that the random variable T is normally 
continuous, while J is discrete. Therefore, unlike the usual notation for distribution functions, 
the second variable in Fy ;(t, j) is not cumulative, but refers to one specific index. 

We now consider the problem of deducing the joint distribution (T, J) from the joint 
distribution of Ti, T5, ... , Tm: 

If we are given the joint density function, then Fy z(t, j) is calculated by integrating over 
a suitable region — see (A.17). Take m = 2. Then F; (t, 1) is the probability that T; takes a 
value less than or equal to t, and that T, takes any value that is greater than that taken by T}. 
This is given by the double integral 


t [es] 
Fy (5,1) — f i fr, 7, Us v)dv du, (17.5) 
0 Ju 


and similarly 


t œ 
Fr j(t,2) = / i) fr, r, (4 v)du dv. (17.6) 


Example 17.2 The joint distribution of T} and T, is given by 


T 6(s—t)?, O<s<1,0<r<l, 
>t = 
frr, i 0, elsewhere. 
Find Fr (t, j) for j = 1,2. 
Solution. 
tfl j f " 1-45 
Fen f / 6(u — v) avdu=2 f a-u du = ————., for0<r<l. 
i o Ju 0 2 
By symmetry, we must have 
1-(1-90f* 
Fre) = =E, 0<t<l. 


Since T; is bounded above by 1 for i = 1,2, we necessarily have 


Fr (td) = Fry) 21/2, i=1,2, t>1. 
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The sum of Fy ,(¢,1) and Fy (1, 2) must of course equal F(t) = 1 — s7(t). It will be 
instructive for the reader to verify this by drawing a picture in the plane, showing that the 
union of the regions of integration in (17.5) and (17.6), and the region corresponding to sf), 
is the entire positive quadrant. 


In the general case, we will need m-dimensional integrals to compute F(t,j) from the 
density function. However, if we already have the joint survival function, the computation can 
be simplified. To illustrate, consider the case with m — 3. Then, reasoning as above, 


t co co 
Fr (t, 1) = i f I f(u, v, w)dw dv du. 
” 0 u u 


From (17.1) the inner two integrals can be written compactly as a partial derivative. We have 


t 
Fyj(,1)— | o(u)du, 
0 
where 
[ 
o(u) = Ta Shits tz), evaluated at ti = [7] = tz =U. 
1 


In the general case, 


t 
Frat = f c;(u)du, (17.7) 
0 
where 


d 
o;(u) = — 3, 002 fy), evaluated att) = £j = + = ft, = U. 
j 


Example 17.3 Suppose that m = 2 and the joint survivor function is given by 


[1—uf-(1-w'-(u-v^ 0<u<1,0<v<1. 


1 
Sr, p, (4, V) = 2 


Find Fr (t, 1). 


Solution. We could take two derivatives to calculate f(u, v) = 6(u — v)?, and apply (17.5). 
(This is in fact the same distribution as in Example 16.2.) Note, however, that (17.5) would just 
*undo' the calculation of the second derivative by integrating. For this form of the distribution, 
it is easier to apply (17.7). On the given region 


- su v) 22[(0— uy + u-v] oW) = 20 -uy, 
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sO 


ic) 


, O<r<l, 
2 


t 
Fr j(t, 1) aj 2(1 — ufdu = 
0 


verifying the previous example. 


17.4.2 Density and survival functions for (T, J) 


We can also define the distribution of (T, J) by the joint density function fr z(t, j). This is the 
function satisfying 


t 
Fr j(t,j) = ) fr iG. ds, fra.) = SFr s(t) (17.8) 


The function f is interpreted in the normal way. Namely, for ‘small’ At, fr (t, j)At is approx- 
imately the probability that the first failure will be from cause j and that it will take place in 
the time interval from t to t + At. The precise statement, following from the first expression 
in (17.8), is that the probability that the first failure will be from cause j and will take place 
between time a and time b is given by 


b 
Fr (b,j) - Fr (a,j) = i fr gt jdt. 
a 


We define the joint survival function for (T, J) by thinking of survival as we did at the end 
of Section 11.2.2. Let 


Sp (tf) = P(T > tand J =j) = Fr jJ) ds. 
t 


This is the probability that failure will occur after time t due to cause j. 

Figure 17.1 shows a typical graph of f(t, j) for m = 2. Note that the mass is concentrated 
on parallel sheets. If we cut the jth sheet by the plane T = f, the area of the left portion will 
be F(t, j) and the area of the right portion will be s(t, j). 

All functions pertaining to the random variable T alone are obtained by summing over all 
j, as we noted above for F in the case m = 2. That is, 


Fr®) = M Fr (tj). (17.9) 
jl 

FO = Y frst), (17.10) 
je 


sr (f) = $ srst). (17.11) 


j=l 
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fr, (D 


fryt 2) 


jz2 
Figure 17.1 The graph of f j(t,j) 


17.4.3 The distribution of J 


To obtain the distribution of J from the joint distribution, we compute the other marginal, 
which can be expressed in various ways. If f, (j) denotes the probability that J = j, then 


f= f fr. dt = sr J0, j) = lim Frj) (17.12) 
0 co 


In Figure 17.1, f;(/) is the area of the jth sheet. Note that F7 z(t, j) + sr z(t, j) is not equal 
to 1, but rather to f;(/). 


Example 17.4 Take m = 2. If T; is uniform on [0,1], T, is uniform on [0,2], and T} and T, 
are independent, find the distribution of J. 


Solution. The joint density function takes a constant value of 1/2 on the rectangle 0 € s < 
1,0 € t € 2, and is 0 elsewhere. The maximum value of T is the minimum of the respective 
maximums of the T;, which in this case is 1. Therefore, 


1 2 3 
fO) = Fr (1.1) = / / (1/2)dv du = 7. 


1 1 
50 = Fr 0.2 f f oma = i 
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17.4.4 Hazard functions for (T, J) 


Definition 17.1 The hazard function for (T, J) is given by 


fri D 
sp) — 


Hp Jt, j) = 


This is a conditional density. For small Af, yr ;(t,j)At is approximately the probability 
that failure will occur first from cause j in the time interval from f to t + Af, given that failure 
from any cause has not yet taken place before time f. 

In the Chapter 11 multiple-decrement model for a life age x, uy ;(t,j) corresponds to 

) 
Hy (2). 

Given the hazard rates, we can obtain the joint distribution of (7, J) by the same method 
as employed in Chapter 11. From (17.10) and the definition of p(t, j), 


m 


up) = M wr s.j), (17.13) 


j=l 
and then, from (15.7), 
srt 2e h Ap Gr 


We know that fr 7(s,j) = 57(s)“7,,(s,j) and, from the first expression in (17.5), 


t 
Fep- f srG)pr (s, j)ds. (17.14) 


The last formula is easily explained intuitively. For the event in question to occur, there 
must be some point s, before t, for which failure from any cause has not yet occurred, and 
then failure will occur from the jth cause at time s Although (17.14) has this intuitive appeal, 
it is not necessarily useful for computing Fr z(t, j) as we may not know the joint hazard rates 
until we have already computed Fy z(t, j) and fr z(t, j). It is, however, an important formula in 
the independent case to which we now turn. 


17.4.5 Theindependent case 


As in Section 17.2, we can simplify calculations when the 7; are independent, and deduce 
the distribution of (T, J) directly from the individual distributions of each T;. Since si(t) = 
—s(t)uj(), (17.3) shows that i 


oj) = |] ssi = spn 
idj 
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and then from (17.7), 
t 
Fry (t, j) = | sr(u)u(u)du, for independent T;. (17.15) 
0 


Note, as a comparison to (17.14.), that (17.15) can be used directly to compute Fr z(t, j) 
in the independent case, when we know p; from the individual distributions. Moreover, 
differentiating and dividing by s(t) verifies that in the independent case 


Hr (tJ) = ug) (17.16) 


for j = 1,2, ... m and all t for which s(t) > 0, This provides the promised proof for the result 
stated in in formula (11.23) . 

The result is easily explained intuitively. Looking at the machine model of Section 11.6, 
for example, both quantities in (17.16) give a conditional density for failure of part j at time t. 
In the case of uy z(t, j), the condition is that all parts have survived up to time f and in general 
this may give information regarding the failure time of part j. Suppose, for example, that the 
parts are connected so that part 2 cannot fail until part 1 does, and then it fails five seconds 
later. (The same idea in the multiple decrement model provided the simple counter-example of 
Exercise 11.15.) In the independent case, however, we obtain exactly the same information as 
if we told only that part j has survived up to time f, which is precisely the condition applicable 
to uj). 

We have already encountered special cases of (17.15). One example is (10.28). The hazard 
rate y(t) corresponds to u(t) which equals pr z(t, 1) since we postulated independence. 
Another example is (11.13). 


Example 17.5 Suppose that the 7; are independent and exponential with constant hazard 
Hi. (a) Find F; (t, j). (b) Find f,(j). (c) Show that T and J are independent. 


Solution. 


(a) Let u = py + uo + ++ + Hm. Formula (17.4) shows that T is exponential with constant 
hazard p, and, from (17.15), 


t i 
Frya) = [ nds = Ža =e, 
0 


(b) fy) = lim Fryt) = Ë. 
(c) We just note that 
Fryt, j) = Fro. 
The solution to (b) gives an important result that has many applications. It says that for 
independent exponential failure times, the probabilities of first failure are proportional to the 


hazard rates. This makes sense, since the higher the hazard rate, the lower the mean, which is 
the reciprocal of the hazard rate, and therefore more likelihood of occurring first. 
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Example 17.6 Take m= 2. Suppose T, and T, are independent, 7, is uniform on [0, a], 
and T, is uniform on [0, b], where 0 < a < b. Find F; (f, 1) and Fy (t, 2). 


Solution. Since sr) (t) = s()s Ou (t) = sS(0f1 (t), we can write 


A s\ 1 t Ê 1 
Fr ,(t, 1) = (1-2) <as= 4 - — = F0 - SFP, O0«t«a. 
rj 1) ji ;); 7a A0-;hOFO a 
Similarly, 
t Ê 1 
Fr y(t 2) = b — 53b = F(t) = 5 (OF), 0 «t«a. 


Since failure must take place before time a, for any t > a, 


a a 
ap Fr y(t, 2)- Fr j(a, 2)= — 


Frpj(t,.12F 121- 
y(t, D r (a. 1) 2b 


As a check, note that Fr z(t, 1) + Fr y(@,2) = Fi} + Fo) — Fi(t)F5(0, which must be 
true, since for T to take a value less than t means that at least one of T, and T, takes a value 
less than t. Also note that the answer here could have been written down immediately, since 
the distributions satisfy the stochastic version of the condition given in (11.24), and therefore 
we can apply Method 2 of Chapter 11. 


17.4.6 Nonidentifiability 


We motivate the idea of this section by an example. First note that the joint distributions given 
in Examples 17.1 and 17.2 are easily seen to be different. One way is to note that 7, and T» 
are not independent in the latter. 


Example 17.7 Calculate F7 j(t,j) for the distribution given in Example 17.1 


Solution. We could do this by (17.15), but it is easier to note that, by symmetry, Fr z(t, 1) = 
Fr j(t, 2) and the two must sum to F7(f) = 1 — s(t). We can conclude directly from Example 
17.1 that 


{= 
Fy t, 1) = Fr (t, 2) — o 
and necessarily 
1 
Fry (5,1) = Fry@2) = > 1<t. 


Compare the above result with Example 17.2. The somewhat surprising conclusion is that 
two completely different distributions for (7T, 75) have led to exactly the same distribution 
for (T, J). This is known as the nonidentifiability problem and it has statistical implications. 
Suppose we want to make inferences about the joint distribution of the T; by observing failure 
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times. In many cases, all we can possibly observe is the joint distribution (T, J). An example 
is when the random variables represent the time of death from various causes. Once death 
occurs, we know the time and the cause, but no further observation of the subject is possible. 
Our example above shows that it is impossible to uniquely determine the joint distribution of 
the random variables that give rise to a given (T, J). We need additional information in order 
to obtain a unique solution. One instance when this occurs is in the independent case. 


Theorem 17.1 Given any joint distribution for (T, J) there is a unique joint distribution of 
(Ti, Tz, ... , Tm) such that the (T;) are mutually independent and induce the given distribution 
of (T, J). 


Proof. Uniqueness follows immediately, since, given independence, we know the joint dis- 
tribution if we know the distribution of each T;, and (17.16) implies that each T; is necessarily 
a random variable with hazard function uy z(t, i), which is determined uniquely from (T, J). 

For the existence, given any joint distribution function F for (T, J), we let T; be a random 
variable with hazard function yr z(t, i). This collection of independent T; in turn generates a 
joint distribution function Ê r j: From (17.15), 


t 
FG) = 7 sr(S)ur (s, fds, 
0 


which equals F7. z(t, j) as shown by (17.14). 


To illustrate the use of this theorem, suppose you are told that 


Oxtxl Fypjtl-21l t>1 


for i = 1,2 and asked to identify the joint distribution of (7T, T5). You cannot do this without 
further information, for it could be either the joint distribution of Example 16.1 or that of 
Example 16.2, or indeed several other possibilities. However, if you are given the additional 
information that T, and T> are independent, then you know that it must be the distribution of 
Example 16.1. 


17.4.7 Conditions for the independence of T and J 


Another question of interest is to determine when T and J are independent. In Example 17.5, 
we saw that this occurred with constant hazard functions. We present here a more general 
criterion. Define the ratios 


~ H0) 
K(t,j) = 
ED Hr) 


for all j, and all t, such that s(t) > 0. 


Theorem 17.2. K(t,j) = P(J = j|T = t). Therefore, T and J are independent if and only if 
K(t,j) is independent of t. 
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Proof. We have 


rst) = SpOQuUG j) = KE j)sr(Dur(t) = KE rO. 


So 


. frst j) 
PJ = j|T=H= = K(t,j). » 
fr() NI 
The condition of this theorem is sometimes expressed by saying that the hazards for the 
individual causes are fixed proportions of the total hazard. 


17.5 Other problems 


There are several other questions regarding the joint distribution of (71, 75, ... , T,,,) that can be 
answered by similar techniques to those in Section 17.3. That is, we find a certain probability 
by integrating failure times over a suitable region of m-dimensional space. As an example, we 
illustrate the method for a problem analogous to Example 10.7 of Section 10.9. Take m = 2, 
and consider the probability that both causes of failure will occur within a specified duration 
of each other. That is, for some fixed n, we want the probability that (|T; — T,| < n). It will 
normally be easier to compute this as 


1- P(T, - T > n) = 1- [P(T; > Tj +n) + P(T, > Tí 4 n)]. 


Each term is found by integrating the joint density function over a suitable region in the 
plane. For example, 


P(T, > T, +n) = li / RACA v)dvdu. 
0 utn 


Example 17.8 Find P (|T; — T;| € n), when T}, T; are independent, and both are expo- 
nential with hazard functions i, and py, respectively. 


Solution. The integral above reduces to 


n en 
Hy p 
so the final answer is 
1 n Thy _ H2 en 
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17.6 The common shock model 


In many applications, we have a group of objects whose future lifetimes are generally inde- 
pendent, except that they are all subject to a common hazard, which will result in the failure 
of all, should it occur. In the case of human lives, this could be a natural disaster such as a 
hurricane. In the case of machine parts, it could be something like an electrical problem that 
affects all components at once. The presence of the common shock introduces dependence 
into what would otherwise be independent future lifetimes. 

To model the general situation, we have m + 1 independent, continuous random variables, 
(T, I ..., T”, Z), and for each i we let 


T; = min(T7, Z). 


The interpretation is that 77 is the time until failure of the ith object for reasons other than the 
common shock, and Z is the time until the common shock occurs. It follows then that 7; will 
be simply the time until failure of the ith object, since such failure will occur at either time 
T? or time Z, whichever is earlier. 

In the remainder of this section, we will confine ourselves to the case where m — 2. 
Quantities referring to T* will have a superscript *. 

We are interested in questions about the joint distribution (T1, T5), which involves depen- 
dent random variables. However, in many cases, we can answer these questions by considering 
the independent collection (77, T7, Z). We will illustrate with several examples. A key fact to 
note is that 


T = min(T;, T2) = min(TT, T}, Z), 
since both give the time of first failure. 


Example 17.9 Find a formula for the probability that both objects will survive to time t. 


Solution. This is just Sis Osz). 


Example 17.10 What is the probability that failure will occur as a result of the common 
shock? 


Solution. This is just P(J = 3) in the joint distribution of (T, J), where J takes the values 1, 
2, 3 and T is the minimum of Tf; T and Z. 


Example 17.11 What is the probability that the second failure will occur before time t? 


Solution. We divide this up into two mutually exclusive cases. It will always occur if Z < t. 
If Z > t, we need both Tt and 15 less than or equal to t. The probability is 


Fz(t) + s2 OF OFZO = 1— szO O + 850 — 510550]. 
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Other problems are not so straightforward and require special attention. The joint distri- 
bution of (7|, T5) is quite different from the typical two-dimensional continuous distribution. 
It still is continuous, but it has a mass of positive probability all concentrated on a single line, 
namely the diagonal, since the occurrence of the common shock will cause failure from both 
causes, leading to a failure point of the form (t, t). In determining the probability that (T1, 75) 
lies in some region A, we will in general have to break A up into three pieces, the part that is 
above the diagonal, the part that is below the diagonal, and the part that is on the diagonal. 

We adopt the convention that the value of T} is on the horizontal axis. For the part of the 
plane above the diagonal, ((u, v) : u < v}, we use the joint density function 


ft MA), 


since the only way T; can take a value u < v is if IT took the value u. In other words, failure 
from cause 1 at time u did not occur from the common shock, since if it did, then failure from 
cause 2 would also have occurred at time u and could not have occurred at the later date v. 

Similarly, for the part of the positive quadrant below the diagonal, ((u, v) : v « u), we use 
the joint density function 


fi Gf, O). 
Since T; = min(T*, Z), the densities f;, i = 1,2, are easily calculated as 
fi = —(s¥s,)'0 = f (Ost) + SOA. (17.17) 


Failure on the diagonal arises if and only if the occurrence of the common shock occurs 
before the other two causes. We use the one-dimensional density function, 


fr (63) = K OZOZA, 


and project the diagonal onto the line. That is, to find the probability that failure took place at 
a point (t, t), where a < t < b, we integrate this density from a to b. 


Example 17.12 Suppose 7T, T5, and Z are exponential with hazard functions 11, 42, and 
p, respectively. Consider the event that T, and T» are both less than or equal to n. This can 


be subdivided into three cases according as (a) T, < T5, (b) T5 > Tj, (c) T, = T5. Find the 
probability of each case. 


Solution. 


(a) Note first that, from (17.17), we can calculate 
a(t) = (uo + p)e 240", 


so that, using the above-diagonal joint density, the required probability is 


n n 
uuo + » f / e Htec Untedydy, 
0 u 
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which equals 


He Tnt) 4. n Ma +P e Mei ++p), 
Hy tHuotp ututp 


(b) Similarly, the required probability in this case is 


He Tn tA) 4 H2 ES Hn tp e "Unitutp) 
HjtHotp HtHü»tp 


(c) The required probability is 


L peer edt = O BTU!) 
: My + Wo +p 


The sum of these three cases is 


] — eut?» _ 9 mint) 4 e" ntuytp) 


as we can verify from the general formula given in Example 16.12. 


17.7 Copulas 


This section, like the previous one, is concerned with situations where there is a lack of 
independence. We present a general method that is often used to deal with this. Attention is 
confined to the case m = 2. 

A joint distribution (71, T5) can be thought of as having two ingredients. One is the 
distributions of the two-component random variables, and the other is the way in which these 
are linked together. The latter can be described by a device known as a copula, which can 
then be applied to an arbitrary pair of individual distributions. The copula provides a means 
of dealing with these two ingredients separately. To elaborate, we start with the observation 
that whenever T, and T, are independent, we immediately recover the joint distribution from 
the individual distributions by the rule 


Fr qf t) = Fr, ti )Fr, (t5). (17.18) 


We can then ask whether we can replace the multiplication on the right-hand-side of 
(17.18) by other transformations, and still obtain a joint distribution — that is, if J denotes the 
unit interval [0,1], whether we can find a function C from / X Z to itself, so that we obtain a 
legitimate joint distribution by the rule 


Fr, 7, (tito) = C(Fr t), Fr, (12)). (17.19) 
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We need some restrictions on the function C. Take any point s in 7. If T, > s, so that 
F T, (s) = 0, then, for all t, F. Geb, (s, t) = 0. The same holds for 75, leading to the condition that 
for all u, v in J, 


C(0, v) = C(u, 0) = 0. (17.20) 


If Tj € s, so that F T, (s) = 1, then for all t, F; T T> (st) 2F T, (t), leading to the condition 
that for all u,v in Z, 


C(l,v) =v, C(u, 1) = u. (17.21) 


Another requirement stems from the fact that probabilities cannot be negative. For any 
sub-rectangle R C I x I, the probability that (T, T5) lies in R is just the sum of the values of 
Fy, 7, On the northeast and southwest corners, minus the sum of the values on the other two 
corners. Since this is nonnegative, it follows that 


C(u, v5) + C(uj, v4) — C(u, v3) — C(u5, v4) È 0. (17.22) 


whenever u; < u and v, € v5. 
These are the only conditions we need and we can now state the formal definition. 


Definition 17.2 A copula is a function C from I x J to J satisfying (17.20)-(17.22). 


Itcan be shown that if C is a copula, then (17.19) gives a valid joint probability distribution 
for any T, and T5. Conversely (and harder to show), for any joint distribution (T, T5) there is 
a copula C such that Fy 7, is given by (17.19). 

If T; and T» are both uniform distributions on /, then F7. (u) = u for i = 1,2, and all u € I, 
from which it follows that 


Fr, 7s v) = C(u, v). 


This shows that as an alternate definition, we can simply define a copula as a distribution 
function of a joint distribution involving two random variables that are uniform on Z. 
The following are three simple examples of copulas: 


1. C(u,v) = uv. This is just the copula for an independent distribution, as mentioned; 

2. C(u,v) = min(u, v); 

3. C(u,v) = max(u + v — 1,0). 

Copulas 2 and 3 are extreme in the sense that for any copula C and for all u, v, € I, 
max(u + v — 1,0) € C(u, v) € min(u, v). 

They are also extreme in the following sense. Consider all possible joint distributions 


for a given T; and T5. In many cases, one is interested in the sum T' + T}. For example, an 
insurer sells two insurance contracts and 7; denotes the claim on the ith policy, or a person 
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buys two stocks and T; is the value of the ith stock at some future date. We may want to 
compare all possible joint distributions as to their degree of risk. We will not go into the 
details of comparing joint distributions as to risk here. (In Section 22.4, we do introduce this 
idea for single variable distributions). However, it seems clear that risker possibilities arise 
when when large values of one random variable tend to go with large values of the other 
so there is a tendency for either both values to be large or both to be small. The less riskier 
possibilities arise when large values of one tend to go with small values of another, so there 
is a possibility for bad results in one case to be balanced by good results in the other. It can 
be shown, that under some natural risk comparing criteria, copula 2 will give the most risky 
joint distribution and copula 3 the least risky joint distribution. This point is illustrated further 
in Exercise 17.14. 

We deal only briefly with problems of choosing a copula to model a given situation. In 
many cases, the modeler likes to choose a copula from a parametric family, and select the 
parameter to suit certain conditions. A popular choice for this is Frank's family of copulas 
given by 

Ou Ov 
Co, v) = Slog (1 + n) 


e9—1 


where © can be any nonzero real number. It can be shown, using L’ Hópital's rule, that for all 
u,V, € I, 


lim Ce(u, v) = uv, 
020 ol ) 


so that the smaller the parameter is in absolute value, the greater the extent of independence 
between the two random variables, with full independence occurring for © = 0. 
An interesting feature which some copulas, but not all, have is that 


C(u,v) =u+v-— 1+ C(1-—u,1-— v). (17.23) 


It is straightforward to verify this property for copulas 1-3 above. It is true, but harder to 
verify, that this holds for Frank's family. The significance of (17.23) is that 


Sr, T, 3. ty) = 1 = Fr, (t) = Fr, (ty) T Fr, p, (f 0) = C(sr, (ti), sr, (15)). 


In other words, the same transformation rule can be applied to either distribution or survival 
functions. 


Example 17.13 Suppose that Demoivre's law holds with w = 100. Consider two lives (60) 
and (70). Find the probability that both will be alive at the end of 10 years, assuming each of 
the three basic copulas given above, for the joint distribution of T (60) and T(70). 


Solution. The individual survival probabilities are 3/4 and 2/3, so the survival of the joint-life 
status is in the respective cases: 


1. 1/2 as we known already from Chapter 10; 
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2. min{3/4,2/3} = 2/3. This copula applied to a joint-life status just means that the 
younger life will die at exactly the same time as the older; 


3. (3/44 2/3 - 1) 2 5/12. 


For a particular application of copulas, refer back to the nonidentifiability problem of 
Section 17.4.6. Instead of assuming independence, we might postulate a certain copula C and 
then ask for a joint distribution with the chosen copula that gives rise to the given distribution 
(T, J). In many cases this will be unique. 


Notes and references 


Nelsen (1999) is a good general source for additional material on copulas, including full 
derivations of the unproved results that we have given. Frees and Valdez (1998) discuss various 
actuarial applications of copulas. Carriérre (1994b) discusses the application of copulas to 
the nonidentifiability problem in multiple-decrement theory. For methods of comparing two- 
variable distributions for risk, see Shaked and Shantikumar (2007), Chapter 6. 


Exercises 


Type A exercises 


17.1 A joint distribution is given by 


3s, O<s<t<l, 
frr ft)243, O<t<s<l, 
0, elsewhere. 
Find F; (t, 1) and F; (t, 2). 
17.2. A joint survival function is given by 
bia) 1 — (3/2)G2 +v) + (1/2)u3 + (3/2)? , 
s(u,v) = 
1 = (3/2)(u? + v?) + 0/2)? + G/2)u, 
Find Fr (t, 1). 
17.3 Suppose that T, and T, are independent with p.d.f.’s 
A= 2e, fhr) = 3e”. 
(a) Find Fy ,(t, 1) and F; z(t, 2). 
(b) Find the distribution of J. 


17.4 Suppose that 7; and T, are independent. T, has an exponential distribution with 
constant hazard rate u. T; is uniform on [0, a]. Find (a) F7 z(t, 2), (b) P(J = 2). 


17.5 


17.6 


EXERCISES 277 


The failure times T} and T, are independent, and have respective hazard functions 


3 
My) = 2,9 Et«2, m(t) = log(2). 
Find the probability that the minimum value of these two random variables will be 
less than or equal to 1. 


A machine is subject to two independent causes of failure. The time of the first cause 
of failure is uniformly distributed on the interval [0, 4]. The time of the second cause 
of failure is uniformly distributed on the interval [0, 5]. Find the probability that: (a) 
the machine will fail from cause 1 before time 3; (b) the machine will eventually fail 
from cause 2. 


Two failure times T} and T, have a joint distribution given by the joint density function 


4v—-uy, O<u<v<l, 
frrQ5v)-44u-v, O<v<uK<l, 


0, elsewhere. 


Find: (a) Fr z(t, 1) and Fy ;(¢, 2); (b) the distribution of J; (c) u(t, 1) and u(t, 2). 


Two failure times 7 and T, have a joint distribution given by the joint density function 


(8/3/72, O<u<v<l, 
fr r, v) = 4 (8/3)uv, O<v<u<l, 


0, elsewhere. 
(a) Find F7 ;(t, 1) and F — T, J(t, 2). 


(b) Find the distribution of J. 


(c) Suppose that 7, and Î, are two independent variables whose joint distribution 
leads to the same distribution of (T, J) as you found in part (a). What are the 
hazard functions of 7, and 75? 


Type B exercises 


17.9 


17.10 


17.11 


For the joint distribution given in Example 17.2, find pt, j) and uj(r) for j = 1,2. Show 
that these are not the same. 


For the joint distribution given in Exercise 17.2, find y(t, j) and uj(1) for j = 1,2. Show 
that these are not the same. 


Consider independent failure times T, and T; where the hazard rate of T, is a/(1 — 
t),0 € t « 1, and the hazard rate of T, is B/(1 — t), 0 € t < 1. Answer the following 
in terms of a and f. 


(a) Find Fy (t, 1). 
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17.12 


17.13 


17.14 


17.15 
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(b) Find P(J = 2). 


(c) An insurance contract provides for a benefit at the moment of the first failure, 
provided that this is due to ‘cause 1’ (i.e. provided that T, < 75). The amount of 
the benefit for failure at time t is (1 — t)e®-!*. The force of interest ó is a constant 
0.10. Find the expected present value of the benefits. 


Two lives age (x) and (y) are subject to the common shock model. T*(x) has a constant 
hazard rate of 0.06, T*(y) has a constant hazard rate of 0.04, and Z has a constant 
hazard rate of 0.02. The force of interest is a constant 0.05. 


(a) Calculate À,(1,9) and write it as a sum of three terms, namely, the expected present 
value of benefits when: (i) (x) dies at a time strictly before the death of (y); (i1) 
(x) dies at time strictly after the death of (y); (iii) (x) dies as result of the common 
shock. 


(b) Calculate Az. 


You are given a common shock model (T*, T7, Z) where Tř has the survival function 
s(t) 2 1 — O.1t, T. has a constant hazard function of 0.04, and Z has a constant hazard 
function of 0.02. Find the probabilities that (a) T; < T5, (b) Ty < Ti, (c) T; = T». 


Suppose that T} and T, each take the values O with probability 1/2, and 1 with 
probability 1/2. Calculate the probability function for the joint distribution of (T, T5) 
under each of the following copulas: (a) C(u, v) = min(u, v), (b) C(u, v) = max(u + 
v — 1,0), (c) C(u, v) = uv, (d) Frank's copula with values of 0 = 0.01,50, —50. What 
happens when 0 approaches oo or —oo? 


A joint life insurance on (x) and (y) has death benefits which are constant over each 
year. Assume that either copulas 2 or 3 of Section 17.7 apply to the joint distribution of 
T(x) and T(y). Show that unlike the independent case, the term R in equation (10.15) 
is equal to 0. 


PART III 


ADVANCED STOCHASTIC 
MODELS 


18 


An introduction to 
stochastic processes 


18.1 Introduction 


The purpose of this chapter is to provide background for the remaining chapters in the 
book. We make much use of the concept of conditioning, so the reader may wish to review 
Sections A.2 and A.8 of Appendix A. 

A stochastic process is the tool used to model a quantity varying randomly in time. The 
following are the essential ingredients. We have an index set T, which gives the points of 
time that we are interested in. Normally, T will either be the nonnegative integers, 0, 1,2, ... 
(discrete time) or the whole nonnegative line [0, co) (continuous time). In both cases, we will 
sometimes have a maximum time horizon that we are interested in, in which case, oo will be 
replaced by a finite N). We also need a sample space with a probability measure P, and for 
each f in T, a random variable X, defined on this space. The random variable X, gives the 
value at time f, of the random quantity that we are trying to model. A stochastic process can 
then be defined formally as a collection of random variables X, defined for each t in a set T. 

We will illustrate briefly by considering the price of a certain stock. Anyone who has been 
involved with the stock market can attest that this is indeed a quantity that varies in time, and 
is subject to all kinds of random influences. Let time be discrete and refer to days. Suppose 
the stock is selling for 100 per share now, and we know that each day it will either increase 
by 20% or decrease by 20%. Therefore, X9 takes a value of 100 with probability 1, while X, 
will take a value of either 80 or 120, X» will take one of three possible values, 144, 96, or 
64, and so on. The details are shown in Figure 18.1. Of course, to complete the model, we 
have to specify probabilities. We will not do this quite yet, as we want to first discuss the 
complications that arise. 

The first principle to observe is that we are not just interested in the distribution of each 
X, but also in all the possible joint distributions. To illustrate, suppose we want to know the 
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Figure 18.1 Evolution of stock price 


probability that the stock will be priced at 115.20 at time 3. From Figure 18.1, we can identify 
three mutually exclusive ‘path segments’ leading from 100 at time 0 to 115.20 at time 3. The 
desired probability can then be written as 


P(X, = 100, X, = 120, X; = 144, X, = 115.20) + P(X = 100, X, = 120, X; = 96, 
X; = 115.20) + P(X = 100, X, = 80, X; = 96, X4 = 115.20). 


Many of the questions we ask of stochastic processes are, like the above, concerned with 
realizations. A realization (sometimes called sample path or scenario) of a stochastic process 
is a function defined on the index set T, sending f to x,, which represents a possible outcome 
of the process, namely that in which the random variable X, takes the value x, for all t in T. 
For example, in our example above, the realization x, = 100 x 1.2! represents the outcome 
whereby the stock continually moves upward. 

It is often useful to view a stochastic process as a model for assigning probabilities to 
realizations. A technical difficulty arises, however. Normally T is infinite, and X, takes at 
least two values, except possibly for X9. This means there are uncountably many realizations, 
and so each single realization will normally have probability 0. We must therefore deal with 
infinite sets of realizations, as these can have positive probability. (This is exactly analogous 
to the situation with any continuous random variable X where we know that the probability 
that X takes a particular value of x is always 0, but we are interested in the probability that X 
takes values in some infinite set.) 

Note that in our stock example, an event such as (X, = 100, X, = 120, X, = 144, 
X4 — 115.20), which we referred to above, is not by itself a realization (unless our index 
set T were just (0, 1, 2, 3)). Rather, it is an infinite set of realizations, consisting of all those 
with the given value of x, for t = 0, 1, 2,3, and including all possible values for t > 3. 
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The reader should realize at this point that if the index set T is in fact finite, then a 
stochastic process is formally nothing more than a multi-dimensional joint distribution. 


18.2 Markov chains 
18.2.1 Definitions 


Throughout this section, we will assume that our index set T is the nonnegative integers and 
that each X, is discrete. 

To answer relevant questions about a stochastic process, we want to be able to compute all 
finite joint distributions. Normally, however, we do not do so directly but deduce these joint 
distributions through some other information that we can determine about our model. Suppose 
we are at time k, and we want to predict what will happen to our quantity at time k + 1. This 
can be quite complicated as it could depend on the entire past history of what happened up to 
time k. In many applications, this is simplified, since the only relevant part of this past history 
for this prediction is the actual value at time k, and we get no further information from looking 
at the values before that time. The following formulates this precisely. 


Definition 18.1 A discrete-time stochastic process with discrete random variables is called 
a Markov chain, if given any finite sequence xo, x1, x2, ... , X441, Where x; is a possible value 
of X;, 


P(Xy, 4 = Xai lXe = Xk Xk-1 m Xe + XO = XO) = PO = Xka lXk x). — (08.0) 


To illustrate, consider the stock example given above. Suppose that we decide that each 
day the stock will move up with probability of 2/3 and move down with probability 1/3. This 
is clearly a Markov chain since both the left hand side and right hand side of (18.1) are 2/3 if 
Xpyy = L.2xy, 1/3 if x, = 0.8x;, and O in all other cases. Suppose, however, that we decide 
that these probabilities will hold only when the stock has made two different movements on 
the previous two days. On the other hand, we decide that if the stock moves up two days in 
a row, it signifies a trend, and the probability of an upward move on the next day changes to 
3/4, while if the the stock moves down two days in a row, it signifies a pessimistic attitude, 
and the probability of an upward move on the next day is only 3/5. This would no longer be 
a Markov chain, since the probability of X, is clearly influenced by the values of X,, ) and 
X,_1 as well as that of X}. For example, taking k = 3, 


P[X4, = 92.16|X3 = 115.20, X = 96,X( = 80] = n 
but 
P[X, = 92.16|X4 = 115.20, X = 96, X = 120] = L. 
In a Markov chain, these would both have to equal the same number, namely P[X, = 
92.16|X3 = 115.20]. 
The important feature of the Markov property is that it allows us to compute all relevant 


probabilities once we know the probabilities of each ‘branch’ of the tree-like structure, as we 
have drawn in Figure 18.1. To elaborate, we introduce some fundamental notation. 
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For any integers k < n, let 
Palk, n) = P(X, = y|X, = x). (18.2) 


The probability of any branch is of the form p,,(k,k + 1) and we can multiply the proba- 
bilities of each branch to get the probability of any path. Finally, we can compute the general 
probability p,,(k, k + n) by adding the probabilities on all paths that lead from a value of x at 
time k to a value of y at time n. 

Consider our stock example with the (2/3, 1/3) probabilities of the respective up or down 
move. We can now easily answer our question, and find the probability that the stock price 
is 115.20 at time 3, that is P100,115.20(0, 3) There are 3 paths leading from the starting value 
of 100 to the value of 115.20 at time 3. Each of them consists of two up moves and one 
down move, and will have probability equal to : x = x 1 = =, and so the total probability in 
question is 12/27. 

In many common applications (as in this one), there is the further simplifying feature that 
Px yk, k + 1) is independent of k, and can be denoted by just p,,. In this case, we say that we 
have stationary transition probabilities. (This feature is also referred to as time homogeneity.) 
In connection with stochastic processes, the word stationary can be thought of as referring 
to a process ‘without a watch’. In the present context, it means that whenever the evolving 
quantity takes a value of x, the probability that it takes a value of y at the next stage is always 
the same, regardless of the particular time. 


18.2.2 Examples 


We now look at some other examples of Markov chains. One of the most famous is the random 
walk. An indecisive person goes for a walk but cannot decide whether to go east or west. 
A coin is flipped and the person goes 1 unit east if a head comes up or 1 unit west if a tail 
comes up. After each move, the coin is flipped again and the procedure repeated. Suppose the 
probability of a head coming up is p. Letting X, refer to his position east of the starting point 
at time k, we have the process shown in Figure 18.2. 

What is the probability of being 1 unit to the west of our starting point at time 3, that is, 
at position — 1? This is similar to the stock question asked above. We have three possible path 
segments leading to —1 at time 3, each with probability p(1 — p)?, so the answer is 3p(1 — p)*. 

This same model applies to many situations. Imagine a gambler repeatedly playing a game 
with an even money payoff, such as betting on black at roulette. In any single play, this person 
either wins 1 unit with probability p or loses 1 unit with probability 1 — p. If we let X, denote 
the total winnings after k plays (a negative amount signifying a loss), then we have exactly 
the random walk process as described above. 

Many of the processes that we will deal with can be viewed in this gambling context. It 
will be convenient to make a slight adjustment. Instead of keeping track of the amount won 
or lost, we will keep track of the total fortune of the gambler starting from an initial fortune 
of u. This is just a matter of adding u to each entry. For example, if the gambler starts play 
with a fortune of 10 units, then the diagram of the process would be as shown in Figure 18.3. 

Consider a situation where the wager at each stage is more complicated than a simple even 
money bet. Suppose that the return to the gambler on this bet is some discrete random variable 
G. (In the case above, G simply took the value 1 with probability p and —1 with probability 
] — p.) The gambler starts with a fortune of u, and we want to consider the stochastic process 
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U, which equals the fortune of the gambler at time n. Letting G, be the return at time k, we 
can write 


Usu EO T *G. (18.3) 


where the G;s are independent and each distributed as G. This is clearly a Markov chain with 
transition probabilities given by 


Pyylk,k + 1) = P(G =y- x), 


and since the right-hand side is independent of k, we see also that this process has stationary 
transition probabilities. 

Note that it would still be a Markov chain, but not necessarily a stationary one, if the 
gambler were to change the bet at various times, as long as this was done independently of 
the past history of winnings. An example of this would be for the gambler to decide that after 
five turns at the roulette table, regardless of what happens, he will switch to blackjack. In this 
case, the distributions of the various G, are not all the same. If, however, the gambler decided 
that he would play roulette until he had five consecutive losses and then switch to blackjack, 
the resulting process would not have the Markov property. 

We return to the process given by (18.3) in Chapter 23 where it will play an important 
role in modelling the surplus of an insurer. 


18.3 Martingales 


In this section, we briefly introduce another important class of stochastic processes, which we 
will make much use of later. 


Definition 18.2 A discrete-time stochastic process is a martingale if 
E(X, ,4|X, = Xp Xk—1 — XR , Xp = Xo) = Xp 


for all k and values xo, x), ... , Xg. In other words, at any time, the expected value of our quantity 
at the next time period is exactly what it is now. 


Note that in the case of a Markov process, the requirement above simplifies to 
E(X p41 Xk = x) = x 


Is our stock price process with the 2/3 and 1/3 probabilities a martingale? The answer is 

no, since 
E(X,,1|lX, = 3) = 2 x 1.2x + l x 0.8x = oak > x. 
3 3 3 

A process like this, for which the expectation of the future random variable is always 
greater than or equal to the present one, is known as a submartingale. It would only be a 
martingale if the 2/3 were replaced by 1/2. Indeed, we would not expect a stochastic process 
for stock prices to be a martingale, since it would mean that there was no tendency for the 
price to increase. However, the reason for purchasing stock in the first place, given the inherent 
risk, is the expectation of capital gains as the stock increases in value. 
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What about the process given by (18.3)? We have 
E(U,,1|U, = x) = E(U, + Gi 44 |U, = x) =x+ E(G). 


This process will be a martingale if and only if E(G) = 0, that is, if and only if the bet 
made by the gambler is a fair bet. The martingale concept was in fact introduced originally 
as the model of a fair game. In the usual casino games, E(G) < 0, and we would not have a 
martingale. Such a process, where the expected value of the future random variable is always 
less than or equal to the present one, is known as a supermartingale. A martingale is therefore 
both a supermartingale and a submartingale. (To remember the terminology, note that the 
modifier applies to the current value. So a submartingale means that at any time, the current 
value of the process is under the expectation of the future value.) 


18.4 Finite-state Markov chains 


Sometimes, it is convenient to relabel with integers the possible values that can be taken by 
the random variables in a Markov chain. We refer to these integers as the states of the system 
and say that the system is in state i at time k if X; takes the value i. In fact we often just 
take N, the number of the state, as the underlying random variable in place. of X, so that our 
fundamental probabilities p;;(k,n) will be the probability of being in state j at time n given 
that the process was in state i at time k. In this section we consider Markov chains with a 
finite number of states and develop matrix methods for investigating their properties. In this 
chapter we confine our attention to the stationary case. Nonstationary chains are needed in 
Chapter 19 and will be introduced at the beginning of that chapter. 

Starred sections contain somewhat more advanced material and these can be omitted at first 
reading (although we do refer back to one of the results in one of the Chapter 23 examples). 


18.4.1 The transition matrix 


Suppose we have a stationary Markov chain with N states. We will number them (0, ..., N — 
1}. (Some authors use the numbers 1 to N instead). By the stationary condition, the quantity 
pik, k + 1) the probability of moving from state i to state j in one step, is, independent of k, 
and we can write it as just p;;. The matrix P, with entries of p; in the ith row and j column is 
known as the transition matrix of the chain. 

Note that the only condition required for an N x N matrix to be the transition matrix of 
some Markov chain is that all entries are nonnegative, and each row sums to 1. 


Remark The reader should keep in mind that we have started the indexing at 0, so the 
top row (or left column) of the matrix will be considered as row (or column) number 0, not 
number 1. 


Example 18.1 A box contains two balls, either of which can be red or yellow. At each 
stage, a ball is chosen at random and replaced with a ball of the opposite colour. Let X, be the 
number of red balls in the box at time n. Find the transition matrix. 


Solution. Here we have a three-state Markov chain. We will number the states by the number 
of red balls in the box. If there are 0 or 2 red balls, we are sure to move to state 1, while if there 
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is 1 red ball, we move to either state O or 2, with equal probability. Therefore the transition 
matrix is 


0 1 0 
P=|1/2 0 1/2 
0 1 0 


18.4.2 Multi-period transitions 


We now wish to calculate the general probabilities p;;(k, k + m) which by stationarity will be 
equal to pj;(0, m). We start by calculating this for m = 2. Suppose we move from state i to 
state k at time 1 and then from state k to state j time 2. We know from our previous discussion 
on Markov chains that the probability of this two-step move occurring is just p;,p,. Summing 
this over all states k, we get the probability of moving from i to j after two periods as 


N 
È Pbi (18.4) 
k=1 


which, by the ordinary rules of matrix multiplication, is just P?(i,j) the (i,j)th entry of 
P xP = P7. The same argument can be repeated to show the important fact that in a 
stationary chain, 


Dij(Q, m) is equal to the entry in row i and column j of p” (18.5) 


(Note that P? is just the identity matrix, usually denoted by Z, with entries of 1 on the main 
diagonal, and entries of 0 elsewhere.) 


18.4.3 Distributions 


Given a finite-state Markov chain, what is the distribution of X,? This is a basic question 
that we ask about any stochastic process. The distribution can be given as an N-dimensional 
vector z,, whose ith entry, denoted by z, (i), equals P(X, = i). This will depend of course on 
Tọ, the vector giving the initial distribution at time 0. (In certain applications, we might know 
the value of Xp, in which case, zo will simply be a vector with a single entry of 1, and other 
entries equal to 0.) Starting with n = 1, we calculate 


PX, =j)= 2 nop; 


which is just the jth entry of the vector obtained by multiplying the vector zo (viewed as a 
1 x N matrix) on the right by P. The same argument holds for any n, and we can conclude 


Ty = ToP”. (18.6) 


Example 18.2 In Example 18.1, suppose we start with a uniform distribution, that is, there 
is a 1/3 chance that X takes each of the values 0, 1,2. What is the probability that X9, = 1? 
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Solution. Computing large powers of matrices is often done by calculating eigenvalues, but 
we can avoid that here. After a couple of multiplications, we see that P? = P. Therefore 
P5 = Pp? = PP? = P, and similarly, any odd power of P equals P. From (18.6), 


0 1 0 
7,9, = 0/3,1/3,1/2| 1/2 0. 1/2| = (1/6,2/3, 1/6). 
0O 1 0 


The probability that X,9, equals 1 is 2/3. 


Remark Some writers define the transition matrix P by taking the entry in row i and column 
j as the probability of moving from state j to state i. In this case, columns, rather than rows, 
add to 1. The vectors z, are written as column vectors and in (18.6), we multiply zp on the 
left to get z,,. 


*18.4.4 Limiting distributions 


The distributions z, will normally continue to change with time, but in many applications, 
we would like to show that there is some limiting distribution z which is independent of the 
initial state at time 0. That is, for any initial state j, and all states i, the resulting probabilities 
z,,(i) will converge to z(i) as n approaches oo. This would enable us to predict with reasonable 
accuracy the probabilities of being in various states, provided the process has been continuing 
for a sufficiently long time. We consider here the problem of finding this limiting distribution, 
provided it exists. The last provision is necessary, as indicated by the following examples 
where a limiting distribution does not exist. 

Consider the chain of Example 18.1. If the process is in state 0 or 2, it will move to state 
] in one transition. If it is in state 1, it will move to state O or 2 in one transition. So if we 
start, say in state 0, the process is sure to be in state 0 or 2 at even times, and in state 1 at odd 
times. Clearly, no limiting distribution can exist. This is an example of what is known as a 
periodic chain of period 2. In general, this means we can divide the set S of all states into two 
disjoint subsets, $, and S5, such that in any one transition, all states in $; move to $5 and all 
states in S, move to 5;. More generally, we could have chains of period d where we can find 
d pairwise disjoint subsets, such that from each subset, we cycle through the other sets in a 
fixed order, and return to the original set after d transitions. 

Another example where a limiting distribution will not exist is the chain with transition 
matrix 


1/2 1/2 0 
P=|1/2 1/2 Of. 
0 0 1 


In this case, if we start in state 2 at time 0, then we remain in state 2 forever. If we start in 
state 0 or 1, we will reach a limiting distribution, which is to be in state O or 1 with equal 
probabilities.(In fact, this limiting distribution is achieved exactly at time 1). The problem 
here is that the limiting distributions varys according to the initial state. This is an example of 
what is known as a reducible chain. This means we can divide the set S of all states into two 
nonempty disjoint subsets S, and S, such that any state in S, transfers to another state in 5; 
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after one transition and any state in S» transfers to another state in S, after one transition. We 
can really consider such a Markov chain as two separate chains, one comprising the states in 
S, and the other comprising the states in S5. 

A simple condition that ensures the existence of a limiting distribution is that there is a 
positive integer n such that all entries of P" are positive. That is, given any states i,j (not 
necessarily distinct), there is some chance of getting from i to j in n transitions. A proof of this 
result is beyond the scope of this book. It is, however, not difficult to see that this condition 
rules out both periodic and reducible chains. See Exercise 19.12. 

If we know that a limiting distribution z exists, then there is a straightforward procedure 
for finding it from the matrix P. We know that z = limpo z, (in the sense of converging at 
each i as described above). By continuity considerations, we must have zP = lim, ,,, z,,P. 
From (18.6), we know that z, P = z,P"P = zjP"*! = z, | ,. But clearly z,,, also converges 
to æ. This establishes that 


z — zP. (18.7) 


Note that a solution z of Equation (18.7) need not be a limiting distribution, since none 
may exist. It is true that once we have a distribution z satisfying this equation, we will 
remain with that distribution forever, but it is possible that from some initial states we will 
never converge to z. Example 18.1 is an illustration. We have exactly one solution to (18.7), 
namely z = (1/4, 1/2, 1/4), but as we have shown, we will not approach this for all initial 
distributions. (It is of interest to note that this is the distribution we would get if we started 
the process by choosing the colour of each ball randomly.) 


*18.4.5 Recurrent and transient states 


The states of a stationary Markov chain can be divided into two classes. Given any state j, let 
f; be the probability that, starting in state j, the process will return to that state. 


Definition 18.3 State j is said to be transient if f; « 1. In this case, there is some chance that 
the process will never return. 
State j is said to be recurrent if f; = 1. In this case, the process is sure to return. 


Example 18.3 Classify the states of the Markov chain with transition matrix 


3/4 1/4 0 
P=|1/2 1/2 0 
1/4 0 3/4 


Solution. State 2 is transient, since starting at state 2, there is a positive chance of moving to 
state 0. From there, only states 0 and 1 can be reached, and there is no return to state 2. 

We will show that states 0 and | are recurrent. Suppose we are in state 0. We first observe 
that it is certain that we will eventually get to state 1. For any positive integer n, the probability 
of staying in state O forever is certainly less than the probability of staying in state 0 for the 
next n transitions, which is (3/4)". As this quantity approaches 0, we see that the probability 
of staying in state 0 forever is 0, and so with probability 1 we will eventually move to state 1. 
Similarly, if we are in state 1, we are certain to move to state 0. Therefore, if we are in state 0, 
we are certain to move to state 1 and then back to state 0 Arguing in the same way for state 
1, we see that both states are recurrent. 
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This is a fairly simple case, and in general it might not be so easy to classify the states 
from the matrix. We can, however, adapt the argument in Example 18.3 to prove a general 
result, which can be of great help in the classification. 


Definition 18.4 We say a state j is reachable from a state i if there is a positive probability 
of eventually moving from state i to j. In terms of the matrix P, this condition can be stated as 
P"(i,j) > 0 for some n. 


Theorem 18.1  /fi is recurrent and j is reachable from i, then j is recurrent. 


Proof. Starting in state j, we are certain to eventually reach state i, for if not, then there 
would be a positive probability of going from i to j and never returning to i, contradicting the 
fact that i is recurrent. Now, starting in state i, let a be the probability that we will return to 
state i without ever hitting state j. Since j is reachable from i, we must have that a < 1. The 
probability that starting in state i we will make n return visits to i without ever hitting state 
jis a”, which approaches 0 as n goes to co. Therefore, the probability is 0 that, starting in i, 
we never reach state j, which means that we are in fact certain to eventually reach state j. To 
conclude, starting in state j, we are certain to reach state i and certain to come back to state j 
from state i, showing that state j is recurrent. 


Starting in a recurrent state, we must always remain in recurrent states. What happens if 
we start in a transient state? 


Theorem 18.2 In a finite-state stationary Markov chain, there is at least one recurrent 
state. Moreover, starting from any transient state, we must eventually reach a recurrent state. 


Proof. Given a transient state j, consider all realizations of the process for which there is at 
least one occurrence of j. The probability that there will be exactly one such occurrence, given 
that there is at least one, is 1 — fj, the probability of never returning to j. The probability that 
there will be exactly two such occurrences, given that there is at least one, is just fj (1 — fj). The 
process must return once, and then never return again. Continuing, we see that the number 
of occurrences of j, less 1, given that there is at least 1, has a geometric distribution.(See 
Section A.11.3) Now a geometric distribution is a proper frequency distribution with no 
probability of assuming the value oo. Our conclusion is that the probability of infinitely many 
occurrences of any transient state j is 0. Consider the event that the chain never visits a 
recurrent state. If this occurs, since there are only finitely many transient states, one of them 
must appear infinitely often, but as we have seen the probability of this is 0. This means that 
with certainty, we must reach a recurrent state. 


We can apply the last two theorems to Example 18.3. Once we see that state 3 is transient, 
we know by Theorem 18.2 that either state 1 or state 2 is recurrent, and then by Theorem 18.1 
(or even by symmetry) they both must be recurrent. 

Note that Theorem 18.2 need not hold for an infinite-state chain. For a trivial example, 
take the random walk where the probability of moving to the right is 1. All states are transient. 

Some transient states are ‘less transitive’ than others. That is, the expected time spent 
in that state is longer. Indeed, for many applications we may be interested in this expected 
time spent in a certain transitive state j, that is, the expected number of n for which X, = j. 
(There is no point in asking this question for a recurrent state, since, by definition, the process 
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is in the state for infinitely many values of n.) Of course, this expectation may depend on 
the starting state i. Once again this question is of interest only for a transitive starting state 
i, since for a recurrent starting state, we know by Theorem 18.1 that the answer is 0. We 
calculate this expectation by the familiar trick of looking at indicator random variables (see 
the end of Section A.5). Fix such a starting state i and consider the random variable 7, that 
takes the value 1 if X, =j, or 0 if X, # j. Then the expected number of visits to state j is 
just E[ Ma Lj- $0 E(1,), since A.22 extends to infinite sums for nonnegative random 
variables. Since E(I,) = pO, n), we have 


oo 
Expected visits to state j starting from state i = » p,n). (18.8) 
n=0 


The right hand side of (18.8) may seem quite formidable to calculate. It is possible to use 
eigenvalues of the transition matrix P to shorten this, (see Section 19.3.4) but there is a much 
simpler method, based on an idea that is useful in many contexts. Recall the formula for an 
infinite geometric progression. For a number x, of absolute value less than 1, 


ü-x!21-4x4x ten. 


Similarly, it can be shown that for a matrix Q with sufficiently small entries, the matrix 
I — Q is invertible, and 


A-Q =I+Q+Q +. 


Suppose that the transient states in our matrix are numbered 0, ... .m — 1 and take Q to be 
the m x m submatrix consisting of the first m rows and first m columns. Using Theorem 18.1, 
we can show that for i,j between 0 and m — 1, the (i, j)th entry of Q” is equal to p;;(0, n) with 
the final result that for any two transient states i, j not necessarily distinct, 


Expected visits to state j starting from state i = the (i, j)th entry of (I — Qr! (18.9) 


Example 18.4 (Random walk with absorbing barriers) Consider a random walk on four 
consecutive points 1,2,3,4 on a line. Starting at either 2 or 3, the process moves right with 
probability 2/3 or left with probability 1/3. Whenever the process gets to either 1 or 4, it 
remains there forever. Classify the four states as to recurrent or transitive, and for each pair 
(i, j) of transitive states find the expected number of times the process will be in j starting from 
state i. 


Solution. Points 2 and 3 are clearly transient. Points 1 and 4 are clearly recurrent, in fact a 
particular type of recurrent state, known as an absorbing state, which means that once there, 
you never leave. Numbering point 2 as state 0 and point 3 as state 1, for the transient state 
submatrix Q, we have 


(0 2/3 = 1 —2/3 -1_ [9/7 6/7 
o- (45 2 I o- (.14 1 j; prr 3/71 9/7]: 
The final matrix gives the expected number of visits. For example, starting at point 2, the 
expected number of times the process visits point 3 is 6/7. 
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18.5 Introduction to continuous time processes 


We now turn our attention to processes where the index set T takes all values in the interval 
[0, co). We will also allow continuous rather than discrete random variables. 

One goal is to define continuous-time analogue for the process given in the discrete case 
by (18.3). In Chapter 23, we apply this to modelling the surplus of an insurer. Another goal 
is to introduce Brownian motion which can be viewed as a type of continuous-time random 
walk, which we apply in Chapter 20. 


Notation For continuous-time processes, we will write the time variable in brackets rather 
than as a subscript. That is, we write X(t) in place of X,. 

We write X ^ Y where X and Y are any two random variables to mean that they have the 
same distribution. 


We begin with a few general definitions that capture some of the features we introduced 
in the discrete time setting. Given a stochastic process X(t) and two times s < t, the random 
variable X(t) — X(s) is called an increment of the process, since it gives the increase in the 
value over the period running from time 5 to time f. 


Definition 18.5 We say that the process has independent increments if the increments over 
disjoint time intervals are independent. 


This constitutes a strong version of the Markov property. Given times s « t, we can write 
X(t) = X(s) + X(t) — X(s). This shows that the value of X(t) can certainly depend on the value 
of X(s), but we get no additional information from looking at times before s, since both 
X(t) — X(s) and X(s) are independent of what happened in the interval [0, s). 


Definition 18.6 We say that the process has stationary increments if the distribution of any 
increment depends only on the length of the time interval and not the particular starting point. 
That is, given any s, t, h > 0, we require that 


X(s +h) - X(s) ~ X(t + h) - X(0. 


This constitutes a strong version of the assumption of stationary transition probabilities 
that we made for discrete Markov chains. 


18.6 Poisson processes 


Suppose we have a particular ‘event that we are interested in, occurring repeatedly and 
randomly in time. This could be an insurance claim or the arrival of a person at a queue, or a 
number of other possibilities. A counting process is a stochastic process N(t) that counts the 
number of such ‘events’ that have occurred up to time t. Formally, it is just any continuous-time 
process that takes nonnegative integer values, and such that all realizations are increasing. A 
particular realization can then always be drawn as an increasing step function. We will always 
assume that in any counting process, N(0) = 0. In other words, the counting starts at time 0 
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before any events have occurred. A major application in this text will be to insurance claims, 
which is discussed in Chapter 23. 
We will confine attention to the particular case of Poisson processes defined as follows. 


Definition 18.7 A counting process N(t) is called a Poisson process with rate A if it has 
stationary and independent increments and if, for all h > 0, 


N(h) ~ Poisson(Ah). 


It follows from the stationary increment assumption that, given any ¢ > 0, the random 
variable N(t + h) — N(t) ^ Poisson(Ah). In other words, a Poisson process is simply a counting 
process with independent increments, such that the number of occurrences in any time interval 
is a Poisson distribution, with parameter proportional to the length of the interval. 

When should we choose a Poisson process to model a counting situation? There is another 
characterization of Poisson processes that gives some insight into answering this question. 
Suppose we assume that the increments are indeed stationary and independent. The alternate 
formulation says essentially that we will get a Poisson process if it is ‘highly unlikely’ to 
have more than one event occurring in a ‘sufficiently small’ time interval. Therefore, if you 
feel that this is the case for the particular event you are trying to model, you can be justified 
in choosing the Poisson process. To derive this characterization, we must first give precise 
meaning to the phrases ‘highly unlikely’ and ‘sufficiently small’. This is done conveniently 
through the ‘little o’ notation, which we will now review. 

We say that a function f defined on an interval [0, b] is o(h) if lim, „o f(h)/h = 0. This 
means that f is getting small rapidly as h gets small, more rapidly than A itself. For example, 
f= h? is o(h), while f(h) = Vh is not. We often write the symbol o(h) for such a function. 
For example, we would write 


ef” = 1 + Bh + olh), 


as can be seen from the Taylor series expansion. It is clear that, given two functions f and g 
that are both o(h), their sum f + g is o(h), and for any constant c the function cf is o(h) as 
well. We can now state the desired characterization. 


Theorem 18.3 A counting process that has stationary and independent increments is a 
Poisson process with rate À if and only if the following hold: 


(i) PQN(h) = 1) = åh + o(h), 
(ii) P(N(h) = 2) = o(h). 


Partial proof. One direction is clear. For a Poisson process with rate A, we have 
P[N(A) = 1] = Ahe7*" = Ah[1 — Ah + o(h)] = Ah + o(h). Similarly, we have P[N(h) = 0] = 
e747” = | — Ah + o(h). Therefore, P[N(h) > 2] = 1 — P[N(h) = 0] - PIN(h) = 11 = 1- (1— 
Ah + o(h)) — (Ah + o(h)) = o(h). 

The converse is the more difficult part. We assume conditions (i) and (ii) and must show 
that N(h) is Poisson. There are various methods of doing this and we will not elaborate further. 
A nice proof can be found in Ross (2010, Theorem 5.1) where this is done by calculating the 
moment generating function of N(h). 
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18.6.1 Waiting times 


Given any counting process, (t), there is an associated waiting-time process, W,,n = 1,2,..., 
where W, is the time between the (n — 1)th event and the nth event. We take the Oth event as 
occurring at time 0. So, for example, if the first event occurs at time 1, the second at time 1.7, 
the third at time 2.3, we would have W, = 1, W, = 0.7, and W3 = 0.6. For some problems, 
it is more convenient to deal with W, rather than N(t). Note that the waiting-time process is 
a discrete-time process (although the index does not refer to times exactly) with continuous 
random variables, while the counting process is a continuous-time process with discrete 
random variables. 

A natural question is to investigate the waiting-time process for a Poisson process, 
and this is easily answered. Let A be the rate of the process. Suppose that W; = w,, W3 = 
W»,..., W,4.4 = w, 4. What is the distribution of W,? If s = w; + wz + w, 4, the (n — 1)th 
occurrence was at time s, and in order for W, to be greater than or equal to w, we require 
that there be no occurrences in the interval (s, s + w], which, by the stationary increment 
assumption, has the same probability as no occurrences in the interval (0, w]. This probability 
is just e74”, which is easily recognized as the survival function of an exponential distribution. 
We conclude that the W, are independent and each is distributed as Exp(A). 

Another random variable that is of interest in many applications is T,, the time of the nth 
occurrence (sometimes called the nth arrival time). Clearly, T, = W, + W, + --- + W,, and it 
is easy to see using moment generating functions that 7,, is distributed as Gamma(n, A). 


18.6.2 Nonhomogeneous Poisson processes 


In many counting processes, the stationarity assumption is not realistic, as we can expect the 
rate of occurrence to vary with time. To model this, we use a type of process that is similar to 
a Poisson process, except the rate A is no longer a constant but rather a function of t. 


Definition 18.8 A counting process is called a nonhomogeneous Poisson process with 
intensity function A(t) if it has independent increments and, for all t > 0, 


(i) PNG +h) -NO = 1) = A(OR + olh), 
(ii) PING + h) — N(t) > 2) = o(h). 


It can then be shown for s < t, the increment N(t) — N(s) ~ Poisson(((s, t)), where 


t 
w= f A(r)dr. 


We omit the proof. 
Note that when A(f) is a constant A, then $(s, ft) is just equal to (t — s), and we have 
exactly the same conclusion that we had before in the case of a regular Poisson process. 


18.7 Brownian motion 


18.7.1 The main definition 


We now look at another continuous-time stochastic process which has several applications. 
This is intended as an introduction and we will not supply all proofs. 
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A stochastic process X(t), 0 € t < oo, is called a Brownian motion process with variance 
parameter o°, for some c > 0, if it satisfies the following four conditions: 


(i) X(0) = 0; 
(ii) the process has independent and stationary increments; 
(iii) for each t > 0, X(t) is normally distributed with mean 0 and variance ot; 


(iv) the realizations, t — x, are continuous functions of t. 


Note that on a formal basis the definition is similar to that of a Poisson process, although the 
nature of the processes are quite different. Both have independent and stationary increments. 
In the Poisson case, the increments have a Poisson distribution with expectation proportional 
to the length of the interval, and in the Brownian motion case, they have a normal distribution 
with variance proportional to the length of the interval. In the Poisson case, the realizations 
are step functions which involve sudden jumps, in contrast to the continuous realizations of 
Brownian motions. 

A standard Brownian motion is one in which o? = 1. We will denote this by B(t). For the 
Brownian motion X(t) with variance parameter o? it is clear that 


X(t) = c B(t). 


18.7.2 Connection with random walks 


A Brownian motion can be viewed as a continuous random walk, where moves are made at 
each instant of time. Consider first our simple random walk of section 18.2.2. We start at 0 
on the real line, and each time unit, we move one unit either to the right or left, with equal 
probability. Let us compute the distribution of X(t), our position at time f. It is easy to see that 
this is related to a binomial distribution. Precisely 


X(t) ~ 2Bin(t, 1/2) — t, (18.10) 
from which it is clear that (see (A.41)) 
E(X(t) = 2(1/2) 1 — 0,  Var(X(t) = 41/4) = t. 


We see that our simple random walk has the mean and variance of condition (iii) in the 
definition of a standard Brownian motion. 

Let us now speed up our random walk by making moves more frequently. Instead of 
moving each time unit, let us move every 1 /m of a time unit where m is some positive integer. 
Of course, we also want to change the length of each move. One may think that instead of 
moving | unit at each step, we should now move 1 /m of a unit. However, this will not preserve 
the variance condition that we want. To do so, we need to make much larger moves. A key 
fact is that we will need to make the length of each move equal to 1m. So for example, if 
m = 4, we would make moves of length 1/2 at times 1/4, 2/4, 3/4, etc, while if m = 100, we 
would make move of length 1/10 at time 1/100, 2/100 etc. Let X™ (f) denote our position 
at time ¢ under this new arrangement. It is not hard to see that this will be the same as the 
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position at time mt of the simple case with moves of | unit each time period, only multiplied 
by the length of each move, which is 1/ vm. From (18.10) 


gus dinigs 1/2) E, 
m m 
and we see as above that 


E(X* (t) = 0, Var(X A = t. 


The standard Brownian motion B(t) can be viewed as a limiting case of X(t) as m 
approaches oo. 


*18.7.3 Hitting times 


For many stochastic processes we are interested in the time that it first reaches a certain point. 
These random times are known as hitting times. We will deduce hitting time distributions for 
Brownian motion, by invoking one of its key properties, known as the symmetry principle. 
This simply says that if the process is at a point b at time s, then at any later time ¢ the 
distribution must be symmetric about b. In other words, given that B(s) = b, then for any 
numbers h « k. 


P(b +h < Bit) < b + k) = P(b — k < Ba) < b — h) (18.11) 


This feature is clear for the simple Random walk, where we are just as likely to move to the 
right as to the left, and therefore is true for Brownian motion by viewing it as a limiting case 
of random walks. 

Let B(t) be a standard Brownian motion, and for any point a, we let 


T, = inf {t : T(f) =a} 


To compute the distribution of T,,, we will derive a relationship between the distribution 
of T, and distribution of B(t). Suppose first that a > 0. Then by Equation (A.30) 


P[B(t) = a] = P[B(t) 2 a|T,, < t] PIT, < t] 
+ P[B(t) > a|(T,, > t] PIT, > t]. 


We note first that the second term on the right equals 0. Indeed, If T, > t then the first time 
the process hits a is after time f, and the value at time ¢ could not be greater than or equal to 
a, since if so, by continuity of the realizations, we would have needed to reach a sometime 
before time T. 

Consider now the first term. The quantity P(B(r) > a|T,, € 1) = 1/2, since T, < t means 
that the process hit a at some time s prior to t and by the symmetry principle it is just as likely 
to be below as above. Formally, we are applying (18.11) with h = 0,k = co. This enables us 
to state the desired relationship 


P(T, € t) = 2P(B(t) > a). 
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Since B(t) is normal with mean 0 and variance t, we know that P(B(t) > a) = 1 — ®(a/ vn 
where © is the c.d.f of the standard normal. Moreover, for a < 0, the distribution of T, is, by 
symmetry, the same as that T_, since we start the process at 0. Our final conclusion is then 


P(T, < 0 = 2[1 - 9(lal/ VD]. 


This enables us to find the distribution of another random variable of interest. Let M(t) be 
the maximum value that B assumes in the interval [0,t]. This will be of course a nonnegative 
random variable, since we start at 0. For a > 0, we use continuity again to see that M(t) will 
be larger than or equal to a if and only if the process reached a sometimes in the interval [0, t] 
which means that T, < t. So 


P(M(t) > a) = 2[1 — 9(lal/ VD]. 


This is an intuitively appealing result, since it says that for any a > 0, the probability that 
a standard Brownian motion takes a value greater than a at any time before time f is exactly 
twice the probability that it takes a value greater than a at time f. 


*18.7.4 Conditional distributions 


Given two times s and t, what is the conditional distribution of B, given that X, took a certain 
value b? We consider the two cases. 

If s < t, this is relatively straightforward. We can imagine the process just starting all over 
again at time s except with a starting value of b rather than O. It is then easy to deduce that 


If t > s, (B(t)|B(s) = b) is normal with mean b and variance (t — s). (18.12) 


What if s > t? Now we are asking for the distribution of a random variable conditional 
on future values and the procedure is more complex. Could we perhaps get the same answer 
except with variance = (s — t)? Obviously, not since this would be false for t = 0 where we 
know by definition that B, takes the value 0 with certainty. It turns out that we have to multiply 
these quantities by t/s. 

We will work with the density functions and let f, stand for fg). Now given that B(s) = b, 
we will have B(t) = a provided that the increment B(s) — B(t) takes a value of b — a. By 
stationarity, this increment is distributed as B,_,. Then 


fias (b — a) 
AO) C 


where we invoke the hypothesis of independent increments for the numerator. Let r = t/s. 
Plugging in the various normal densities and doing some algebra, the left hand side reduces to 


2 


r(s— t) 


S(alX, = b) 


for some constant K and we can then deduce that in place of (18.12) we have the following. 


If t < s, (B(t)|B(s) = b) is normal with mean rb and variance r(s — t), where r = t/s. 


NOTES AND REFERENCES 299 


18.7.5 Brownian motion with drift 


A Brownian motion process is a martingale. We have not defined this precisely for a continuous 
time process, but the idea is similar to that of the discrete time case. If a Brownian motion 
process takes the value b at some point of time, then the expected value at any future point of 
time will be b, as the symmetry principle shows. One often wants to model situations where 
this is not the case and the quantity under study is growing in value. A simple model of this 
type is a Brownian motion with a drift coefficient p. This is a process which still has stationary 
and independent increments, X(t) is still normally distributed with variance c?t, but now, the 
mean of X(t) is ut rather than 0.In other words, on the average the quantity grows as a rate of 
H per time period, so in any time period of length h, the quantity can be expected to increase 
by uh. To state this precisely, X(t) is a Brownian motion with drift coefficient jJ and variance 
parameter o? if and only if 


X(t) ~ c B(t) + ut. 


18.7.6 Geometric Brownian motion 


We are often interested in quantities which grow at a constant relative rather than absolute rate 
of growth. An example is stock prices. We might, for example expect a stock to grow in value 
at a rate of say 5% per year. A possible model in this case is geometric Brownian motion. This 
is a positive valued process X(t) such that Y(t), the logarithm of X(t) is a Brownian motion 
with drift parameter u and variance parameter o?. That is, 


X(t) = ef = et *eBO) 


Note that each X(t) will have a log-normal distribution. 

When c = 0, X(t) is simply the amount of a quantity growing exponentially in time at rate 
pt. So a geometric Brownian motion can be viewed as a process that has an underlying pattern 
of exponential growth, with a rate of growth that is not constant but is itself subject to random 
changes as given by a Brownian motion. This is a commonly used model for stock prices in 
modern finance theory and we will so apply it in Section 20.18. 

A quantity which is useful in many applications is the conditional expectation of X(t), 
giving that the value of X(s) = a for some s « t. To compute this, we first write 


X(t) = e! = CMM O-YG) = x(S O-O, 


We know that Y(t) — Y(s) is normal with mean (t — s)u and variance (t — 5)o?. So using the 
expression given in Equation (A.58) 


E(X(0|X(s) = a) = ae79* «7997/2. 


Notes and references 


Ross (2010) has a good basic introduction to stochastic processes, covering more material 
than we do here. The section on Poisson processes is based largely on Chapter 5 of that 
book. Hoel et al. (1972) give a somewhat more advanced treatment. Kemeny and Snell (1963) 
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provide extensive coverage of finite-state Markov chains. A proof for the sufficient condition 
given for the existence of a limiting distribution can be found there in Theorem 4.12. Mikosch 
(1998) has more material on Brownian motion, with applications to some of the material we 
discuss in Chapter 20. 


Exercises 


18.1 


*18.4 


*18.5 


A certain stock has a price of 100 at time 0. At any time k, let M, denote the average 
of the prices at times 0, 1, ... , k. The price at time k + 1 will be either M,, M, + 20, 
or M, — 20, each with probability 1/3. What is the distribution of the stock price 
at time 2? What is the expected value at time 2? Is this process a Markov process, 
martingale, submartingale, supermartingale? 


Suppose that r black balls and r white balls are distributed equally among two urns. 
At each trial, a ball is chosen randomly from each urn and put in the other urn. Let 
X, be the number of black balls in urn 1 at time n. Describe the transition function 
for this Markov chain. 


The price of a certain stock is either 9, 10 or 11. If the price is 9 on any day, the next 
day's price will either be 9 or 10 with equal probability. If the price is 11, the 
next day's price will be either 11 or 10 with equal probability. If the price is 10, the 
next day's price will be either 9, 10, or 11 with equal probability. 


(a) Model this as a Markov chain and write down the transition matrix. 
(b) If the price is 9 on Monday, what is the probability that it will be 11 on Friday? 


*(c) Find the proportion of the time, in the long run, that the price will be each of the 
three possible values. 


Consider the Markov Chain on states 1 to 5 with transition matrix 
2/3 0 0 1/3 0 
0 1/2 0 0 1/2 


1/4 0 3/4 0 0 
1/2 0 0 1/2 0 


Decide whether each state is transient or recurrent. 

Consider the Markov chain on states 1 to 4 with transition matrix 
1/3 1/3 1/3 0 
1/3 1/3 0 1/3 


1/3 0 1/3 1/3)" 
0 0 1/2 1/2 


Decide whether each state is transient or recurrent. 


*18.6 


*18.7 


*18.8 


*18.9 


*18.10 


*18.11 


18.12 
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Consider the Markov chain on states 1 to 4 with transition matrix 


0 08 02 0 

0 0 03 07 
05. 05 0 Of; 

0 0 0 1 


Decide whether each state is transient or recurrent. 


For a two-state Markov Chain with transition matrix 


Ce dis 
q 1-4J' 


find the limiting distribution in terms of p and q 


An indecisive diner enters a restaurant and is taken to a round table with five chairs. 
Unable to decide which chair is best, the diner switches every minute, moving clock- 
wise with probability p and counterclockwise with probability 1 — p. We can then 
consider a Markov chain X,,, the number of the chair occupied at time n, in minutes. 
Show that the condition ensuring a limiting distribution holds in this case. What is 
the limiting distribution? What can you say if there are four chairs rather than five? 


A particle moves around a square with four vertices, numbered 0,1,2,3 going 
clockwise. It moves clockwise with probability 2/3 and counterclockwise with 
probability 1/3. Motion stops when the particle reaches vertex 3. Find the expected 
amount of time the vertex will spend at vertex 2, starting from each of vertices, 0, 1,2. 


Show that for either a periodic or reducible Markov chain, given any integer n, there 
exists some ordered pair (i, j) such that the entry in row i and column j of P" = 0. 


A golfer has a probability of (1 — 0.17) of making an n-foot putt, where n is less than 
10. He adopts the following routine for practicing putts from three to seven feet. He 
begins with a three foot putt. Anytime he makes a putt, he tries one which is a foot 
longer and any time he misses, he goes back to a foot shorter putt. If he misses on 
the three-foot putt or makes the seven-foot putt, he repeats that distance. 


(a) If he continues his session for a very long time, what is the most likely distance 
that he will finish with? Estimate the probability of finishing at that distance. 


(b) Repeat part (a) only assuming now that he is practicing putts from four to eight 
feet. 


In a Poisson process, the probability that exactly one event will occur in any given 
hour is 3e. 


(a) What is the probability that exactly two events will occur in any 20-minute 
period? 


(b) Suppose you start observing the process at some point of time. What is the 
probability that it will be less than 10 minutes until an event occurs? 
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(c) Take a unit of time to be | hour. Starting at time 0, find the expectation and 
variance of the time of the fifth occurrence. 


Suppose M(t) is a Poisson process such that P[N(2) = 1] = 6e-6. 
(a) Find P[N(1) = 2]. 


(b) Identify the distributions (including parameters) of W,, the associated waiting- 
time process, and 7,,, the time of the nth arrival. 


(c) Find P[(N(1) = 1 and N(3) 3]. 


Vehicles pass a certain marker on a highway in accordance with a Poisson process 
at the rate of 48 per hour; 25 % of the vehicles on the road are trucks and 75 96 are 
cars. Suppose you observe this marker at a particular point of time. 


(a) Whatis the probability that exactly five vehicles will pass in the first 15 minutes? 


(b) Given that exactly two trucks passed in the first 5 minutes, what is the probability 
that the fourth truck will pass in the first 10 minutes? 


(c) Given that exactly two trucks passed in the first 5 minutes, what is the expected 
value and the variance of the number of vehicles that passed in the first 
10 minutes? 


The following gives information about three counting processes, N! (r), N? (t), N?(r). 
Give reasons why each of these cannot be a Poisson process. 


(a) P[N! (2) — N'(1) = 10] = 0.5, P[N! (3) - N! (2) = 10] = 0.4 
(b) P[N?(1) = 1] = 0.5 P[N*(1) = 1 and ?(2) = 2] = 0.2 
(c) PIM?) 20] 2 05, P[N?2) = 1] = 0.3 


Consider a Poisson process N(f) with rate 2 per time period. You are given that 
N(1.8) = 4. Using this information, answer each of the following. 


(a) What is the probability that N(2.3) < 6? 
(b) What is the probability that the fifth occurrence will be before time 2? 
(c) What is the expected time and the variance of the seventh occurrence? 


At a subway station, eastbound trains and northbound trains arrive independently, 
both according to a Poisson process. On average, there is one eastbound train every 
12 minutes and one northbound train every 8 minutes. Suppose you arrive at the 
subway station at a certain point of time and start observing trains. 


(a) What is the probability that exactly two eastbound trains will arrive in the first 
24 minutes? 


(b) What is the probability that exactly two eastbound trains will arrive in the first 
24 minutes and exactly three eastbound trains will arrive in the first 36 minutes? 
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(c) What is the expected waiting time, in minutes, until the first train (of either type) 
arrives? 


(d) What is the probability that the first train to arrive is eastbound, and the next two 
are northbound? 


(e) What is the probability that it will take at least 20 minutes for two northbound 
trains to arrive? 


Events occur according to a Poisson process. Suppose that the expected number of 
occurrences per hour is 3, and let 7,, denote time in hours of the nth occurrence. Find 
the expected value and variance of T3 given that the first occurrence was at time 1 
and the second occurrence was at time 2. 


In a Poisson process, the probability is 0.60 that after an occurrence of the event, it 
will take at least 2 months until the next event. Find the probability that exactly four 
claims will occur within any 5-month period. 


An insurer finds that out of a certain group of insured drivers, the accident rate 
over each 24-hour period rises from midnight to noon, and then declines until the 
following midnight. They decide that the number of accidents can be modelled 
by a nonhomogenous Poisson process where the intensity at time ¢ is given by 
[1/6 — (12 — 2? /1152], where t is the number of hours since midnight. 

Find: (a) the expected number of daily accidents; (b) the probability that there 
will be exactly one accident between 6:00 a.m. and 6:00 p.m. 


Suppose that X(t) is a Brownian motion with variance parameter 0.03. If X(2) = 3, 
find the probability that X(5) > 3.5. 


For a certain security, each 1 unit invested now will yield the random amount e®-78 at 


time ¢ where B(t) is a standard Brownian notion. What is the probability that sometime 
within the next five time periods, an initial investment will have doubled in value. 


The price of a stock at time f is siven by 
S(t) = 15e”! 935) 
where B(t) is a standard Brownian motion. The quantity u is unknown. An investor 


buys the stock at time 3 and sells it for 20 a share at time 5. What is the probability 
that the investor lost money on this transaction? 


19 


Multi-state models 


19.1 Introduction 


Multi-state models are an attempt to look at a variety of life insurance and annuity contracts in 
a unified manner, by making use of Markov processes. As motivation, take an individual now 
age x and consider a two-state Markov chain, where the person is in state O (alive) or state 1 
(deceased) at any time. Life insurance contracts provide benefits upon transfer from state 0 to 
state 1, while life annuity contracts provide benefits as long as the process remains in state 0. 

More generally, consider a multiple-decrement model for (x) with m causes of failure. We 
can consider a chain with m + 1 states. State 0 means that (x) has not succumbed to any cause 
and is often referred to as the active state. State j refers to having succumbed first to cause 
j. The insurance benefits discussed in Chapter 11 can be viewed as payments upon transfer 
from state O to other states. 

For still another example, consider a joint-life contract issued to (x) and (y). We can now 
take a chain with four states as illustrated in Figure 19.1. The arrows indicate that there are 
possible transitions from state O into state 1 or state 2, occasioned by the death of (y) or (x) 
respectively, and then further transitions from state 1 or state 2 into state 3 when the second 
death occurs. The dotted line, showing a transition directly from state 0 to state 3, would not 
be present in our original model, but would be there if we wanted to consider a common shock 
model as discussed in Section 17.6. A joint life insurance can be considered as two contracts, 
one paying benefits upon transfer from state 0 to state 1, and the other paying benefits upon 
transfer from state O to state 2. A general two-life annuity can be considered as three separate 
contracts, where the ith contract, i = 0, 1,2, pays benefits provided the process is in state i. 
This can be generalized to contracts involving n lives where we will have 2" states. 

We can of course imagine more general patterns of transition. We may wish to investigate 
a more enriched multiple-decrement model where individuals can transfer between states 
several times. A disabled person might recover and re-enter the main group of lives. In our 
original model we ignored what happened to a life once it left the group for any cause, but we 
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Both alive 
State 0 


(y) only 
State 2 


(x) only 
State 1 


Neither 
State 3 


Figure 19.1 A two life multi-state model 


may wish to model the fact that someone leaving for a cause other than death will subsequently 
die. There are many examples of insurance and annuity benefits applicable to this general 
case. When disability is one of the decrements, we may have a contract that pays benefits 
when a person becomes disabled, and then further benefits when a disabled person dies, and 
possibly additional payments that continue during disability. Indeed, a common provision in 
many life insurance policies is a disability premium waiver clause, which mean the person 
does not have to pay premiums during the time they are disabled. We view this as an annuity 
providing payments during a state of disability, which cease when there is a transfer back to 
an active state. These are only some of the possibilities, and we invite the reader to think of 
additional applications. 

In this chapter, we will investigate the general multi-state model. We first discuss the 
discrete-time model, where transitions between states can occur only at integer times. Fol- 
lowing that we deal with the more complicated continuous-time model, where we allow for 
transitions at arbitrary times. 


19.2 The discrete-time model 


19.2.1 Non-stationary Markov Chains 


The transitions in multi-state models are normally age-dependent. To handle this, we need to 
extend the concepts and notation that we introduced in Section 18.4.1 so that they apply to 
the non-stationary case. 

For a non-stationary Markov Chain, in place of the single transition matrix P, we need for 
each nonnegative integer n, a matrix P, whose ijth entry is pj(n, n + 1) 

The main quantities of interest are the probabilities p;;(k, n) of being in state j at time n 
conditional upon being in state i at time k. In deriving (18.5), we do not need the fact that the 
matrices are the same, and the argument given is easily adapted to show that 


Di(k. k + n) is the ijth entry of. PP, ; ... Pk4n-1> (19.1) 
which reduces to (18.5) in the stationary case. Equation (18.6) now takes the form 


Akyn = Ty P4Pua LP. 
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Healthy Unhealthy 


State 0 State 1 


Deceased 
State 2 


Figure 19.2 The healthy-unhealthy-deceased model 


For example, in our simple life-death chain the transition matrices are given by 


P, = Ce E ; 


In the multiple decrement example, the matrix P,, will be similar to that above, except 
the first row will be T UR go, qe TN qua 
diagonal and Os elsewhere. 

We leave is as an exercise for the reader to write down the matrix P, in the joint two life 
case. 

We now introduce a multi-state model which is not covered by our previous cases of 
multiple lives or multiple-decrement theory, and to which we will refer to frequently in the 
rest of the chapter. It allows for a two-way transition between certain states, which is really 
where the generality of the multi-state model is the most helpful. This is the chain as depicted 
in Figure 19.2 where a life can be healthy or unhealthy, but we now allow for recovery from 
the unhealthy state. 

There are various applications, The unhealthy state could refer to being disabled, or sick, 
or a number of other possibilities for a state which we wish to distinguish somehow from the 
main group of lives, but which can transfer back to the main group. 


), and each other row will have 1 on the main 


Example 19.1 Consider the model of Figure 19.2 and suppose that transition matrices for 
times 0, 1, 2 are given as follows; 


0.7 02 O01 0.5 0.3 02 0.4 0.3 03 
Py2[02 06 0.2) P,=]01 06 03) P,=]01 0.5 0.4]. 
0 0 1 0 0 1 0 0 1 


What is the probability that a person who is healthy at time 0 will be unhealthy at time 2 and 
deceased at time 3? 


Solution. 


0.37 0.33 0.30 
PoP; =} 0.16 0.42 0.42], 
0 0 1 
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Using formula (19.1), the required probability equals 


Poi (0, 2)p19(2, 3) = 0.33 x 0.4 = 0.132. 


19.2.2 Discrete-time multi-state insurances 


To model the general contract of this type, we will take a Markov chain with states numbered 
from 0 to N. Suppose the process begins in state a at time 0. Fix any two states i and j, and 
consider a contract which pays a benefit of b, at time k + 1, provided a transfer from state i to 
state j occurs between time k and time k + 1. That is, the process was in state i at time k and 
state j at time k + 1. 


Notation If we have several different transfer benefits we will distinguish by writing by 
as p In general, for quantities like benefits, premiums, reserves and expenses, we will let 
superscripts refer to states, and maintain our previous usage of subscripts to refer to time. 
This differs from our convention for probabilities where we use subscripts to refer to states, 
and write times in brackets. 


A convenient method of discussing multi-state benefits in the context of the stochastic 
model is to make use of indicator random variables. For each nonnegative integer k, let 7, be 
the random variable that takes the values of 0 or 1 respectively, accordingly as a transfer from 
i to j has not or has occurred between time k and k + 1. The present value of the benefits on 
the contract is given by 


Z= Y by DI. (19.2) 
k=1 
Since 
E(I) = Pai, K)p;(k, k + 1), (19.3) 
the actuarial present value is 
ÈX b,v(k + Dp4;Q. Kp; (K, k + 1). (19.4) 


k=0 


We again have our familiar pattern with a sum of three-termed factors. In fact, we can consider 
each summand as a four-termed factor where the probability of receiving a payment is itself 
the product of two factors, the probability of transition to state i by time k and the probability 
of transition from state i to state j in the next period. 


Example 19.2 Consider the chain given in Example 19.1. A 3-year contract written on a 
healthy life provides for benefits at the end of the year of death. The death benefit is 3 if the 
person dies while healthy and 2 if the person dies while unhealthy. If interest is a constant 
5%, find the actuarial present value. 


Solution. We can view this as two separate contracts, one paying 3 for transfer from state 0 
to state 2, and the second paying 2 for transfer from state 1 to state 2. We use the matrices in 
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Example 19.1 and formula (19.1). For the first contract, 
Po2(9, D = 0.1, poo(0, D)po5(1, 2) = 0.14, — pg9(0,2)pg5(2, 3) = 0.37 x 0.3 = 0.111, 
so that 
APV = 3[(0.1)(1.05)7! + 0.14(1.05)? + 0.111(1.05)?] = 0.9543. 
For the second contract, 
po1(0, Dpi5(1,2) = 0.06, po} (0, 2)p12(2, 3) = 0.33 x 0.4 = 0.132 
so that 
APV = 2[0.06)(1.05) ? + 0.132(1.05) ?] = 0.3369. 
Therefore, the total APV is 1.2912. 
Example 19.3 Take the same chain as in the previous example. Consider a contract that 
pays | at the end of any year of transfer from being healthy to being unhealthy, provided this 
occurs within the next 3 years. Find the expected value and variance of the benefits. 
Solution. We now have 

P01, 1) = 0.2, poo (0, D)pg1(1,2) = 0.21 = pon (0, 2)p9) (2, 3) = 0.37 x 0.3 = 0.111, 

so that 
APV = 0.2(1.05)~! + 0.21(1.05)? + 0.111(1.05)? = 0.4768. 

The variance calculation can not be done conveniently by adapting a version of formula 
(15.4). The situation here is different from the multiple decrement model where benefits are 
paid upon transition to an absorbing state. In this case, there can be benefits paid at more than 
one time. We must instead use the approach taken in the current payment formula for annuity 
variances. Note however that the covariance calculations will be more complicated than that 
given by (15.11). Letting 7, take the value 1 if transfer from the healthy state to the unhealthy 
state occurs between time k and k + 1, we have 

Edo) 20.2, E(j) 2021, Ed,)=0.111, EU) = EL) - 0, 
while 


E(1gI5) = Po O, Dpio(l, 2)Po1 (2, 3) = 0.006, 


which is the probability of becoming unhealthy, recovering, and then becoming unhealthy 
again, thereby being paid at both time 1 and time 3. We can then calculate the covariances as 


Cov(Igl;) = —0.042, Cov(Igl) = —0.0162. Cov(I415) = —0.02331, 
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leading to the calculation of the variance as 


(0.2)(0.8)(1.05)? + (0.21)(0.79)(1.05)~* + (0.111)(0.889)(1.05)~° 
—2[0.042(1.05)? + 0.0162(1.05)~* + 0.02331(1.05)? = 0.2195. 


Suppose we wished to modify the contract of the previous example so that it paid off 
only on the first occurrence of becoming unhealthy. In our simple example, this could be done 
readily enough by changing the probability of a benefit payment at time 3 to 0.105, subtracting 
0.006, which we have calculated above as the probability of the second occurrence. To handle 
the problem in general, we need a more systematic approach. Consider a contract paying 
benefits upon the first transition from a state ip to a state jg. A method that works here ( as 
well for many other problems ) is to add additional states. 

The easiest way is to add a single new absorbing state z, and all transfers from state ip 
to state jg are redirected to z, and the contract is modified to pay upon a transfer from ig to 
z. To state this precisely, let p* denote probabilities in the augmented chain. To simplify the 
notation, fix a time n and write pj; and Pi for pj(n, n + 1) and p; n. n + 1) respectively. Then 
p" is the same as p except that 


Pigg = 0. Pigz = Pioio Paz = 4s pQ-O0d*2 pj, = 9, iiid 

There is a more involved method which has the advantage that it preserves relationships 
between other states that might be needed in more complex problems. This is to add N + 1 new 
states i*,i = 0,... , N, where state i* can be considered as a sort of ‘clone’ of state i. Transitions 
between un-starred states are as before except that transitions from ig to jg are directed to j5 in 
place of jg. Except for that, there are no transitions between the two types of states, starred and 
un-starred. Transitions between starred states are as given by the corresponding transitions 
between their clones. In this case, Pi = 0 and for all ordered pairs (i, j) not equal to (ig, jo). 


D; = Pip Pigge = Pig: Php = Pep—0. Pe =0. (19.5) 
For example, if our original matrix for a two state model is given by 
a l-a 
Es (; = 3 i 


and we take iy = 0, jọ = 1, the the matrix of the augmented chain, using the order 0, 1, 0*, 1* 
is 


a 0 0 l-a 
" b 1-b 0 0 
P =jo 0 a l-a[] 
0 0 b 1-b 


In addition to this, the provisions of the insurance are modified to pay upon a transfer from 
state ig to state jj. After the first occurrence of such an event, there can never be a return to 
state ig, so subsequent occurrences are ruled out. 


310 MULTI-STATE MODELS 


Remark Solving multi-state problems with large matrices can involve a great deal of cal- 
culation. There are a number of computing packages that can assist with this. One convenient 
example is the MMULT function of Excel®. 


19.2.3 Multi-state annuities 


We now look at annuity contracts associated with a chain. Suppose again that the individual 
begins in state a at time 0. For a fixed state i, we can consider a contract that pays c, at time 
k provided the person is in state i at that time. Let 7, now be the indicator random variable 
that takes the value of 1 or 0 according as the process is or is not in state i at time k. Then the 
present value of the benefits is the random variable 


Y= >, chy, 
k=0 

from which we calculate immediately 

APV = > c,v(K)p,,(0, K). (19.6) 

k=0 
To derive variances, we calculate for all m <n 

E(I,L,) = Pai. m)p;;(m, n). 

From this, we calculate the covariances and then proceed as in Example 19.3. 


Example 19.4 Consider again the chain given in Example 19.1. A contract on a healthy 
life provides for four yearly payments beginning at time 0. The amount of the payment is 1 
if the person is healthy, or 2 if the person is unhealthy. If interest is a constant 5%, find the 
actuarial present value. 


Solution. We view this as two separate annuities. Contract 1 is for state 1 and contract 2 is 
for state 2. In addition to the matrix PoP, calculated in Example 19.1, we need 


0.181 0.276 0.543 
PQP,P,—|0.106 0258 0.636]. 
0 0 1 


The APV of contract 1 is 
0.7 0.37 . 0.181 


14+ —t+ + = 2.1586, 
1.05 1.052 1.05? 
and that of contract 2 is 
0.2 0.33 0.276 ) 
— + + = 1.4564, 
( 1.05 1.052 1.053 


giving a total APV of 3.6150. 
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Example 19.5 Suppose the contract in Example 19.2 is to be purchased by three level 
annual premiums of z, which are payable only if the person is healthy. Calculate z. 


Solution. Equating values of premiums and benefits, 
z(1 + (1.05)710.7 + (1.05) 20.37) = 1.2912, 
which gives z — 0.6449. 


There are other types of annuity contracts that can be handled by the trick of adding states. 
Suppose for example, a contract provides periodic annuity benefits starting when the process 
enters state ig (or starting at time 0 if the process begins in state ig), but which stop completely 
when the process leaves state ig, even if there is a subsequent return. The easiest approach 
here is to add an absorbing state z and all transitions from state ig to other states are directed 
to state z, ensuring the process never returns to ig. If we want to preserve information about 
other transitions, the method is to add a clone for each state as we did in the insurance case. 
Transitions from state ig to any state j # ig are redirected to the clone j*, also ensuring that 
the process will never return to state ip. 


19.3 The continuous-time model 


In a realistic situation, transitions between states need not occur at discrete intervals, but 
can happen at any point of time, and the benefits payable upon transition from one state to 
the another will be paid at the moment of transition. To model this situation, we need to 
consider continuous-time Markov processes. We still keep a finite number of states, but let 
time vary continuously. The definition parallels that which we make in Section 18.2, only we 
must consider all possible points of time. (We will follow the notational usage introduced in 
Section 18.5 of writing the time variable in brackets.) 


Definition 19.1 A continuous-time Markov process is a stochastic process X(t) : 0 < t < oo, 
where each X(t) is discrete, with the property that given any sequence of times, 0 < tọ < t; < 
X f, < f,,, and any sequence (xo, x1, X» ... Xn» X,,1) where x; is a value of X(t;) we have 


P(X(,,) — Xn] IX(t,) = x. X(t, i) = Xn- X (to) = xg) = P(X(t,,4) =Xn41 IX(t,) =x,) 
As before, for times s < t and any two states i, j we let 
pis. = P(X) = jIX(s) = 0, 


which is the probability of reaching state j at time ¢ when starting in state i at time s. 


19.3.1 Forces of transition 


With the continuous time framework, we are able to define analogues of the force of mortality 
in our original life-death model, and the forces of decrement in the multiple-decrement model. 
As we have seen in earlier examples, the process is often specified by giving these forces, and 
then the central problem is to use these to calculate the desired probabilities. 
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Definition 19.2 We define the force of transition from state i to state j at time t as follows. 
Ifi zj 


. pDy( t h) 
2 diei EE 
while 
. pit h) -1 
anch X 


Note that uj; is always nonpostive, which may seem a bit strange at first, but it is easily 
explained if we consider, for example, the multiple-decrement case where state 0 refers to an 
active life. The negativity of uoo reflects the action of the other forces of decrement, which 
cause transfer out of state 0. 


An important fact is the following. 


Theorem 19.1 
N 


X ut) = 0, for all i and t. 
j=0 


Proof. Fix any i and t. Refer to the little o notation introduced in Section 18.3. From the 
definitions above, for all states j and h > 0, 


pjt, t+ h) = hy + oh), i 7 J, (19.7) 
and 
pit, t h) = hyd) + oh) + 1. (19.8) 
Summing over all j gives 
N 
1-h 3: ui (0) = o(h) + 1. 
j-0 


Subtract 1 from each side, divide by h and take a limit as h approaches 0 to establish the 
conclusion. 


We now want to relate the forces of transition to the transition probabilities. This is much 
more complicated in our general setting than in the simple cases we looked at previously. 


Theorem 19.2 (Kolmogorov forward equations) For any states i,j and times s < t 


N 
P pis. = You Du) (19.9) 


Proof. Westart by deriving the discrete version of the above formula. Following the reasoning 


used in Equation (18.4), we can deduce that for states i,j times s < t and h > 0 


N 


pis. +h) = Y, pa Dpy(t, t + h), (19.10) 
k=0 
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reflecting the fact that in order to reach state j from state i in the time interval s, t + h we must 

first reach some state k by time ft and then go from state k to state j in the next / time units. 

The system of equations given by (19.10) is known as the Chapman-Kolmogorov equations. 
Substituting from (19.7) and (19.8) into the second factor of the summand, 


N 
pis. t+ h) = h V ls, Duy O + pi Gs) + oth). 
k-0 


Subtracting p;(s, f) from both sides, dividing by A and taking a limit gives (19.9). m 


We can give an intuitive explanation of (19.9) paralleling that given in the discrete case. 
View each side as a type of density function for reaching state j at time t when starting from 
state i at time s. The right side reflects the fact that in order to accomplish this, we must first 
reach some state k at time ¢ and then at that instant of time, transfer from state k to state j (or 
in the case that k = j, not transfer back into some other state). 

We view the statement of the theorem as a system of differential equations in the variable t 
for a fixed value of s. The word ‘forward’ arises since our transition is moving forward in time. 
There is a related system, the corresponding backwards equations, which will be developed 
in Exercise 19.6. 

There will normally be many solutions to a system of differential equations of this type. 
There will however often be a unique solution if we specify appropriate initial conditions. In 
our context, these conditions are the obvious requirements that for all f, 


We have of course encountered versions of the Kolmogorov equations before. Take s = 0. 
The simplest case with N — 1 gives Equation (8.15). In the multiple decrement model, the 
Kolmogorov equations produce Equation (11.10). These statements may not be completely 
obvious since the notation is somewhat different. It will be instructive for the reader to verify 
these claims. 

There is a nice matrix formulation of (19.9). Fixing s,t we let P denote the transition 
matrix for the time interval (s, f). This is the matrix with an entry in the ith row and jth column 
of piis, f), as we had in the discrete case. We also have a corresponding matrix M in which 
the entry in the ith row and jth column is y(t). M is often referred to as the intensity matrix. 
We let P’ denote the matrix in which each entry is the partial derivative with respect to t of 
the corresponding entry in P. The Kolmogorov forward equations together with the initial 
conditions can then be expressed as 


P' = PM, (19.11) 
and 


P=I fors=t. (19.12) 


The following is a simple but useful consequence of the forward equations. It generalizes 
a result from our life-death model, when we have a state, such as that of being alive, which 
cannot be entered from any other state. That is, for all j # i and s < t, we have p;;(s, f) = 0. 
We will refer to such a state as being anti-absorbing. 
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In the following theorem, we adopt a formally weaker hypothesis involving forces rather 
than probabilities, which by the second statement turns out to be equivalent. 


Theorem 19.3 Let i be a state such that Hj) = 0 for all j + i, and all t. Then, for s < t, 
Pils, t) = els ENE 
and 
pls, t) = 0, forj Fi. 
Proof. From Theorem 19.2 we get 
7) 
a Pi D = pis. His (19.13) 


a familiar differential equation which we first encountered in (8.15) and which is easily solved 
to give 


pj. t) = K(syels Mar dr 


for some function K(s) independent of t, Since pj;(s, s) = 1, and Dil, s) = Oforj z i, we must 
have K(s) = 1 if j = i or 0 ifj Z i and the conclusion follows. 


Another quantity of interest is the so-called sojourn probability for a state. We define 
pls, t) = PIX, = i,s <r < t|X, = i]. 


Note that the sojourn probability will in general differ from p;;(s, t), as the corresponding 
event requires the process to remain in state i for an entire interval of time, rather than just at 
the endpoints. The two probabilities will however be the same for an anti-absorbing state. 

Since sojourn probabilities are featured prominently in the next section, we will attempt 
to find a way to evaluate them. Let 


qi(s, t) = Pis. t) — pgs. t) 


This is the probability that starting in state i at time s, the process is back in state i at time f, 
having left state i sometime during the interval. Let 


v(t) = lim gor. 
h0* h 
Now we employ once again the technique of adding states. Given a process and a state i, 
we add a single new state i* which is a clone of state i. Transitions out of state i* follow the 
same pattern as transitions out of state i, while transitions that originally came into state i will 
be diverted to the cloned state i*. Now to say that the new process is in state i at a time s and 
also at a later time ¢ means that it must have stayed there throughout as it cannot re-enter. In 


THE CONTINUOUS-TIME MODEL 315 


addition, the forces of transition out of state i remain the same in both processes. So we have 
that 


pz, t) = pgs, t), Mi = Hij» J # Li* 
We do have to consider the quantity už.. Now 
Dix (S$, ®©) = gis, t), 


since the new process will move from state i to state i" precisely when the original process 
moves from state i back to state i having left state i at some point. From Definition 19.2, 


Hs. (t) = v,(t). 
Applying Theorem 19.3 to state i in the new process then leads to the formula 
Pails, t) = els (uiv rar. 


This formula does not however allow one to calculate the term on the left exactly since v; 
also involves sojourn probabilities. All we can say without further assumptions is that 


MODE: efs nidr. 
Suppose, however, that we have a process such that, for all states i and t, h > 0 
q;(t,t + h) = o(h). (19.14) 


This assumption says that it is highly unlikely that in a small time interval there is a move out 
of a state and back into it. It is the same idea that underlies a Poisson process as discussed 
in Section 18.3, where it was highly unlikely to have two occurrences of an event in a small 
time interval. When this assumption holds, we have that v;(t) = 0 for all t and we can now 
state the following theorem. 


Theorem 19.4 Fora process satisfying (19.14), 
pz (s.t) = s hdr. 


Remark We can view the first statement in Theorem 19.3 as the special case of Theo- 
rem 19.4, when q; is actually equal to 0. 


An alternative direct proof of Theorem 19.4 is outlined in Exercise 19.18. 

To summarize our conclusions in this section we have shown that given the forces of 
transition, we can in theory, solve a system of differential equations to arrive at the required 
probabilities that we are interested in for our applications. However, in practice, the solution 
can be difficult or impossible to obtain exactly. In the following subsections we look at various 
possibilities for obtaining or approximating the probabilities from the forces. 
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19.3.2 Path-by-path analysis 


Assume then that we are given „;;(t) for all i,j and t, and we want to determine the probability 
that starting in state i at any time s, we will be in state j at time f. What we can do in a 
relatively straightforward manner is the following. Given any path going from state i to state 
j, we can derive a formula for the probability that starting in state i at time s, the process will 
be in state j at time f, having following the prescribed path exactly. Precisely, we are given a 
path z which is a sequence of states, ig, i4, ..., ij, where iy = i and ij, = j. We then want the 
probability that the process will make a transition from state i to i, followed by a transition 
from i, to i», followed by a transition from i, to i; and so on, continuing in such a fashion, 
without visiting any other states, and finally reaching state j at or before time t, and remaining 
in state j until time f. Let us denote such a probability by PP. t). These can be calculated by 
integration, using only the forces and sojourn probabilities. So assuming that (19.14) holds, 
we have reduced the problem to one of evaluating integrals. 

To illustrate, consider the simplest possible case of a path of length 1, that is z = (i, j). 
Then arguing much the same as we did in the single life and multiple decrement models we 
can write 


t 
p.n = I pals, Du (opc. ndr. 


The above formula parallels formula (11.13) except that in that case all non-active states 
are absorbing, so one cannot get out of them, which means that the third factor in the integrand 
does not appear as it is equal to 1. To recap, the formula above shows that in order to go from i 
to j along the one-step path, three events have to occur. First, the process must remain in state 
i until some time r between time s and time f. This event has probability p;(s, r). It cannot first 
go to another state and return, since that would constitute a different path from the given one. 
Secondly, there must be a transition to state j at time r. We can think of the probability of this 
as being p; (r)dr. Finally, the process must remain in state j from time r to time f. It cannot 
leave state j and return as this would similarly constitute a different path. We get the required 
probability by integrating the product of these probabilities over all times r between s and t. 

As the path length increases the formula gets more complicated. Consider a two-step path, 
a = (i,k, j). Now we need a double integral. The formula is 


t pt 
Pi (Ss i= / / Pg GS. ri Hik DPC Ty) My EP Ddr; dry. 
s Jn 


This looks extremely complex, but it simply records what are now five steps to meet the 
required conditions. The process must remain in state i until some time r,, then transfer to 
state k, then remain in state k until some time r», then transfer to state j, and then finally 
remain in state j until time f. 

A similar calculation that arises for a given path II, is to find the probability that starting 
in state i at time s, the process will transfer to state j before time t having followed precisely 
the prescribed path. We get the same multiple integral as described above, except the final 
sojourn probability is omitted, since the process need only enter, but not necessarily remain 
in the final state on the path. Of course, when the final state is absorbing, the two problems 
are identical. 
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Example 19.6 Consider a heathy-unhealthy-deceased model as introduced in Figure 19.2. 
The forces of transition are constant with 4o; = 2, 4; = 1, Myo = 1, uj? = 3. Find the prob- 
ability that a healthy life will become unhealthy, and then die without recovering before 


time 1. 


Solution. We have fog = —3, u,, = —4. The path in question is z = (0, 1, 2) and the required 
probability is 


1 1 
p7, (0,1) = T i e 12g 02 T3, dr,. 
0 Ti 


The double integral evaluates to 


6 |1-e? 
4| 3 


— ee — » — 0.4279. 


In general, a path of length n will require an n-dimensional integral with an integrand 
consisting of 2n + 1 factors, so this can quickly become computationally infeasible. Aside 
from that there is another difficulty. Usually, what we really want is p;;(s, t). In some cases this 
might be obtained by adding up PP. t) for all possible paths z from i to j. However, even in 
the simplest of models, there could well be an infinite number of such paths and this procedure 
will not work. This will normally occur whenever there is a positive probability of two-way 
transitions, as in the healthy-unhealthy-deceased model where healthy people can recover. 
A person could go from a healthy state to the deceased state, by getting sick and dying, or 
getting sick, then recovering, then getting sick again and dying, or recovering twice before 
death, or any of a number of infinite possibilities. We need other approaches to calculating 
transition probabilities, which we discuss in the next two subsections. 


19.3.3 Numerical approximation 


There are various ways to find a numerical approximation to the solution of the Kolmogorov 
equations. We will describe one approach that is fairly simple to implement. It is similar to 
Euler's method, described in Section 8.9, for Thiele's equation. For a fixed time s, the goal is 
to approximate the probabilities p;;(s, t). Pick a small interval of time A, and change our unit 
of time, so that each original time unit is now 1/h units. (So, for example, if our original time 
unit was a year, and h = 1/12, this changes the time unit to months.). We then essentially 
define a discrete time multi-state model, which will in generally be nonstationary. We will 
define one step transition probabilities for this model by simply looking at formulas (19.7) 
and (19.8), which formed the basis of the Kolmogorov equations, only omitting the o(/) terms 
in each case. Since these terms becomes very small as h does, we can hope that for small 
enough h, we obtain a good approximation. We also switch the origin of time so that the 
original time s is now time 0 and the original time s + mh for an integer m is now time m. 
We then take our approximating discrete non-stationary Markov chain to be that in which the 
transition probabilities, denoted by p, are given by 


pim) = huj + mh) + 1. 
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We then obtain transition matrices (P,,,) which have for an (i,/)th entry, the quantity 
pim, m + 1.) 

Another way of looking at this is that we simply get the matrix (P,,) by taking the intensity 
matrix M(s + mh), multiplying all non-diagonal entries by / and adjusting the diagonal entries 
so that the rows add to 1. 


When t = s + mh for some integer m we can then obtain approximations by 


Di S. f) = the (i, j)th entry of PoP, -~ P, ,. 


Example 19.7  Inachain with three states, you are given that 


Hoi) = Hio(t) = 0.09(1 + £), Mog (t) = 0.18(1 + £), u»(f) = 0.09, m(t) = m(t) = 0. 


Find an approximation to po, (1, 2) using the method described above with h = 1/3. 
Solution. We have 


Ma =|0.18 073 0.09|. 


so that 


0 0 1 
and similarly 
0.79 0.07 0.14 0.76 0.08 0.16 
P, =|0.07 0.90 0.03], P,=]0.08 0.89 0.03}. 
0 0 1 0 0 1 


Then the approximation to po;(1, 2) is the entry in row 0 and column 1 of PoP, P5, which is 
0.15131. 


19.3.4 Stationary continuous time processes 


One case where we can make some progress in solving the Kolmogorov equations directly is 
that of a stationary process as defined in Chapter 18. We then have that each y/;;(f) is a constant, 
denoted by j;;. Moreover, the process will be determined by the quantities p;(t) = p;;(0, 1) 
since the stationary condition implies that p;;(s, f) = pj(t — s). 

As an example, we can write down completely the solution for the general two-state 
stationary process. Suppose the intensity matrix is 


m=( A) 
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Let p(t) = e~““*”), Then the transition matrix P(t), which has the entry of p;{t) in the ith row 
and jth column, is given by 


= -1f{ HeO+v —up(t) +h 
P(t) = (u * v) Gee ESO D ) (19.15) 


We can verify that P satisfies equations (19.11) and ( 19.12) by direct calculation, noting that 
dp(t)/dt) = —(u + v)p(t), and that p(0) = 1. 

For the general stationary process, solutions to P^ = PM can be generated from eigenvec- 
tors of M. These are non-zero vectors a = (ag, 44, ... , ay) for which 


aM = 4a 


for some constant A known as an eigenvalue of M. (The left hand side above is a matrix 
multiplication in which we view a asa 1 x N + 1 matrix. According to the convention we have 
used for the transition probabilities, the vector appears on the left.) To obtain eigenvectors, we 
first find the eigenvalues as those constants for which the determinant of (AI — M) = 0. This 
involves solving a polynomial equation in 4. We can then solve for the resulting eigenvectors. 
Examples will follow. 

Note now that given any such eigenvector a and any fixed i, the matrix P that has all 
zero entries except for an ith row of (age^t, a,e?t ... aye^t) is a solution to P^ = PM. This 
follows since the ijth entry of PM will be daje” which is the derivative of the corresponding 
entry in P. 

Note next that given any finite number of solutions to P’ = PM, the linearity implies that 
any linear combination of these solutions will also be a solution. We therefore seek a linear 
combination that will satisfy the initial conditions. This can always be done in the case that 
we can find N linearly independent eigenvectors. We illustrate with a simple example. 


Example 19.8 For the data as given in Example 19.6, find the probabilities of the following 
events. 

(a) A life now disabled will be active at time 0.5. 

(b) A life now active will be deceased at time 1. 


Solution. The intensity matrix is 


=3- 21 
M=| 1 —4 3 
0 00 


We can see immediately that 0 is an eigenvalue with an eigenvector of (0, 0, 1). 

We can also note that since the rows of P add up to 1, the rows of M add to 0, and the 
last row of P is (0,0, 1), it is sufficient to solve the reduced system P” ' = P'M' in which the 
superscript r indicates a matrix with the row N and column N removed. So we can consider 
eigenvectors of 
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The determinant of AI — M” is 4? + 74 + 10 giving eigenvalues of —5 and —2. Solving, we 
find respective eigenvectors of (1, —2) and (1, 1). (Note that eigenvectors are not unique as 
they can be multiplied by any non-zero constant.). 

To satisfy the initial conditions, we want the first row of P" to be the particular linear 
combination of the vectors a = e^?'(1, 2) and b = e^?'(1, 1) which gives the vector (1, 0) 
when f = 0. We solve this to get a first row vector of (1/3)a + (2/3)b. For the second row, 
we want a linear combination that gives the vector (0,1) when ¢ = 0. We solve this to get a 
second row vector equal to (—1/3)a + (1/3)b. Transition probabilities are then given by 


NS Dos pe s 2 
Poo(0. 2 = 3E 3t E 3° A Po (0.0 = sar E 3° 2r 


i 2 m La 
2t e 5t 2t 


e 1 
P19 (0,1) = —3e ae Brom pu (0.07 5 + 3e 


The answers to the particular questions asked are 
(a) p49(0, 0.5) = 0.0953. 


(b) 1 — poo(0, 1) — po, (0, 1) = 0.8218. 
(The answer to part (b) is of course larger than that of Example 19.6 since here we 
allow for an arbitrary number of recoveries from being sick, as well as death from a 
healthy state. 


19.3.5 Some methods for non-stationary processes 


The above procedure can also be used solve the problem of determining probabilities from 
the forces, when these are “piecewise constant’. We can find transition matrices for each 
time interval on which the forces are constant, and then use the discrete time technique. The 
following simple example illustrates the procedure. It involves only two time intervals but the 
method can easily be extended. 


Example 19.9 For a two-state model, we have forces of transition given by 


1, if0xt«2, 
6.07 0 if2«1«3. 


2, if0xt«2, 
moto. = [3 if2<1<3. 


Find the probability that the process will be in state 1 at time 2.5 given that it is is state O at 
time 0. 


Solution. 


Po (0, 2.5) = pgg(0. 2)pg1 (2, 2.5) + pg (0, 2)p11 2, 2.5). 
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From the matrix (19.15), we have 


E —6 = —6 2 = 26725 

Po 2) = =,  Poi(0.2)= —— | py 25 = —S—, 
2p367* 
pu Q2. = ——— 


Substituting, the required probability is 0.39446. 


One method for handling a perfectly general chain is to approximate it by a piecewise 
continuous one, and then use the method outlined above. To do so, one can choose a small time 
unit, and approximate the forces of transition by replacing them with those that are constant 
on each time interval, possibly using the midpoint value. 


19.3.6 Extension of the common shock model 


As mentioned in the introduction, multiple life theory, multiple decrement theory, or the more 
general model discussed in Chapter 17 can all be looked at in a multi-state framework. In 
the standard cases where failure times are independent, this does not normally provide any 
advantage over the conventional treatments that we have described previously. The multi-state 
approach, however, can be useful in modelling certain types of dependence. As an illustration, 
we will revisit the common shock model of Section 17.5 and discuss some possible refine- 
ments. We confine the analysis to the case of two types of failure, the first with failure time equal 
to min(T*, Z) and the second with failure time equal to min(77, Z). Failure for the two types is 
dependent due to the common shock but there will be additional sources of dependence when 
T and D are themselves not independent. An example of this is the ‘broken-heart’ syndrome 
mentioned in Section 10.2 where one type of failure can hasten failure of the other type. 

We consider a multi-state model with four states, as given in Figure 19.1, but adapted to 
cover general failure times. State 0 means that neither type of failure has occurred, state 1 
means that only the second cause of failure has occurred, state 2 means that only the first type 
of failure has occurred, and state 3 means that both types of failure have occurred. (The states 
1 and 2 then are labelled by the number of the surviving type.) Let y;(t) denote the hazard 
function for T7, and p(t) denote the hazard function for Z. 

Suppose that forces of failure are of the form. 


Hoi (D = wo), Mo = ui. — uos (D = p(t), 
Hia (D = ui (GO) +P +e), Halt) = M(t) + p(t) + e(t) 


for some functions e, (t) and e(t). When these are 0, we have precisely the common shock 
model. In this more general setting, we use e;(f) to build in some dependence between TT 
and D by letting the distributions change after the first failure. The *broken-heart syndrome’ 
would call for positive values of e;(t). On the other hand, there may be cases where we assume 
that the prospects for failure type i improves after the first failure, and we would reflect this 
with a negative e;(t). 

Probabilities in this model can all be calculated by the procedure of Section 19.3.2 since 
there are at most three paths between any two states. Moreover, since no state can be re-entered 
once the process leaves it, we can use Theorem 19.4 for the sojourn probabilities. 
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For the following examples we assume that all forces are constant. 


Example 19.10 Calculate the probability that at time n, the first type of failure will have 
occurred but not the second. 


Solution. From Theorem 19.1, 


Hoo = —ü + Hat ph Hog = —üt + pte. 


We want 
n n 
Pox(0. n) = / Poo GS) Hos G)p25 (s, n)ds = ui f eTUtMtPS e- HtoteD ds, 
0 0 


The integral is easily evaluated to give 


(Hotpteg)n _ o7 (ui tuo t p)n 
— — i (19.16) 


Po2(0, n) = My | 
Hi — & 


As a check, suppose that e, = e; = p = 0, so we simply have two independent exponential 
failure times. The answer reduces to 


e "»(| —e Hj ), 


which is just the probability that the first type of failure occurred before time n multiplied by 
the probability that the second type did not occur before time n. 


Example 19.11 Find the probability that the second cause of failure occurs before time n 
and after the first cause of failure. 


Solution. If II is the path (0,2,3) we want, 


n 
pe, n)- 1 Pox(0. Dus + p + e5)dt, 
0 


(a special case of formula (19.17) in the next section.) Substituting from (19.16) and integrat- 
ing, the answer is 


Hy (Uo + p €3) | — gn tpte) 1 — e" ututp) 
Hy — €2 Hy + pt €; Hi + Ho +p 


When e, = 0 we recover the answer of Example 17.12 (a). 


19.3.7 Insurance and annuity applications in continuous time 


Consider an insurance contract that provides for two fixed states j and k, payments at the 
moment of transfer whenever a transfer occurs from state j to state k. The amount paid for a 
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transfer at time ¢ will be b, (denoted by pU? if we wish to distinguish between several transfer 
benefits). Suppose the process is in state i at time 0. Then the actuarial present value parallels 
the result we saw in the multiple decrement case and is given by 


| byvOp yO, Du )dt. 


If benefits are to be paid on only the first transfer from state j to state k, then this can be 
handled by adding new states, exactly as we did in the discrete case. 

Similarly, for a contract that provides payments made continuously at the periodic rate of 
c, (written as cP if needed to distinguish states) at time t, provided the process is in state j the 
present value is given 


^ civ (np; (0, t)dt. 
0 


Some annuities may make use of the sojourn probabilities. Suppose, for example, that at 
time 0, the process is in state i, and payments at the periodic rate of c, are paid as long as the 
process remains in state i. All payments stop upon the first exit from state i. The present value 
is given by 


/ i c,v(t)pgs(0, t)dt. 
0 


It is possible to fit the continuous time insurances and annuities into a stochastic model, 
and calculate variances as well as expected values, as we did in the discrete case. We will not 
pursue this here. In general, this will require integration of functions whose values are random 
variables rather than definite numbers. This presents both theoretical and computational 
difficulties. 

More general probabilities can be deduced from the insurance formulas by the standard 
method we described in Section 10.8.4 of taking zero interest and benefit functions that take 
the value 1 or 0. For example suppose the process is in state i. The probability that within n 
years it will at some point make a transfer from state j to state k is given by 


f p0, Du Odt. (19.17) 


Example 19.12 An insurance contract based on the model in Example 19.8 provides for a 
benefit at time t of e°% provided that a person now healthy dies while unhealthy. The force 
of interest is a constant 596. Find the actuarial present value. 


Solution. This is given directly by 


a oo -2t _ 2o-5t 
I e004 e911 (0, D)» (dt = 3 | et (zx) dt — 0.596. 


The following is an example which can be thought of as a generalization of contingent 
insurances. Suppose we have a path of states II = igi, ... ij, as described in Section 19.3.2. 
At time 0, the process is in state ig and an insurance contract provides a payment of 1 at the 
moment that the process enters state i,,, having previously followed precisely the path II. For 
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example, consider the chain covering the two lives (x) and (y) as given in the introduction 
and the path II = {0, 1,3}. The contract based on this path is the second death-contingent 
insurance paying upon the death of (x) if this occurs after the death of (y). 


Example 19.13 Suppose that the force of interest and all forces of transition are constant. 
Find a formula for Ay, the actuarial present value of the insurance based on the path II. 
Solution. To simplify notation, let v; = Hii and ju; = -Hij; Then, following the explanation 
for the calculation of probabilities in section 19.3.2, we can set up the following multiple 
integral in which r; denotes the time of transfer from state i; , to state i; 


[es] eo ceo 
Ay = / / "T f VoVy «ee Vye” ea") ... ew Tu- Dern dr dry 1 dri. 
: 5 


The integration can be carried out in a straightforward manner to give 


Ags YoV1 + VM 
(4o + 6) + 5- Qui + ô) 

The formula is quite easy to remember and to compute from. The numerator is the product 
of all the forces of transition along the path. The denominator is a product of factors, one for 
each state on the path, except for the final one. Each such factor is the sum of all outgoing forces 
from that state plus 6. We can take 6 = Oto give the probability that the path II will be followed. 

Note that Formula (8.28), Example 10.5 and Formula (10.25) with substitution from these 
first two are special cases of the above. 


19.4 Recursion and differential equations 
for multi-state reserves 


The basic idea of calculating reserves for multi-state contracts is the same as we have seen, 
except that the reserve must be calculated for each state, (an idea we have already encountered 
in Section 10.4.2). We let ,V denote the reserve at time k when in state j. Given that 
the process is in a certain state, the reserve at any time is as usual, the present value of 
future benefits less the present value of future premiums. Recursion and differential equation 
formulas can become more complicated. 

We start first by looking at the discrete time case, where we assume transfer benefits are 
payable at the end of the period of transfer. 


Example 19.14 Consider the healthy-unhealthy-deceased model, and suppose that we have 
a stationary chain with transition matrix 


0.6 02 02 
P=]0.2 0.6 02]. 
0 0 1 
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An insurance contract provides for payments of 2 when a healthy life becomes disabled, 
3 when a healthy life dies, and 4 when a disabled life dies. All benefits are paid at the end 
of the year of the change of state. This is paid for by a single premium. Assume the interest 
rate = 0. Find the reserves at time k, for k > 0. 


Solution. This can be done by recursion, but we will first do it in a more complicated way, so 
that the ease of the recursion formula method can be better appreciated. We could consider this 
as three separate contracts, but it is just as easy to do it all at once. In view of the stationarity, 
the reserve at time k for k > 0 will be independent of the time. It will however depend on the 
state at time k. Let a — pV and b = pV. In view of the single premium, a will just equal 
the present value of future benefits given the state is 0. Therefore, summing over all possible 
transfer times j, we have 


a — 2 9 po. poi + 3 $, Poo, Pos + 4 Y Poi O./P 12: (19.18) 
j-0 j-0 j-0 


Now, in this case, the matrix P is of a particular simple form. Namely 


—d 
xv ge ee 
pe|ecs oe Y 
5 5 1-c 
0 0 1 
It is not hard to see that 
ck+dk — ck—ak k 
5 7 1-c 
p% = | c-a ck+dk X]. 
c carm 1-c 
0 1 


In the present case, we have c = 0.8, d = 0.4. Substituting in (19.18), we easily sum the infinite 
geometric progressions to get 


a=25(55+ 5g) 02+25 (o; * 55) 0245 Gear: 


We will leave it to the reader to verify similarly that b = 13/3. 


We now look at general recursion formulas. Suppose we have a contract providing for each 
i,j, benefits of bË attimek+ 1 providing there is a transfer from state i to state j between time 
k and time k + 1. If we want to include expenses of payment, these an be incorporated into 
the transfer benefits. We can assume that for each state i, we have b a 0, since payments 
for remaining in a state will be handled by annuity type benefits. We let AO denote the net 


k 
payment collected at time k assuming the process is in state i. These will be the premium less 
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any annuity payments paid out and less any expenses if applicable. Then if the system is in 
state i at time k, 


VO = (v F 28 - F pjkk+1) (oË diac yO iV) (19.19) 
p 


Note that this is somewhat more complicated than previous recursion formulas, where the 
transfer was always to the deceased state, for which no reserves are required. In this more 
general case, upon transfer from state i to state j, we pay out the net amount at risk, and then 
also must set up the reserve required for state j, necessitating the extra term of ,, , V? above. 

The following rearranged version of the recursion formula is instructive as well as being 
well suited to computation. For each state i, 


(VO +a) a eio Y pude D (BP + v9). 
J 


This simply says that the amount accumulated at the end of a period must provide for all 
the benefits paid out at that time, as well as all the reserves that are needed in the various 
states. 

Solving this general recursion to get reserves is not as straightforward as the cases we 
have previously looked at where reserves were all 0 except for one state. In general, we need 
to solve a system of linear equations rather than a single equation. 


Example 19.15 Solve the previous example by recursion. 
Solution. Substitute in (19.19) to get 
a=a-—0.2(2+b-—a)—-0.23-a) b= b- 0.2(a — b) — 0.2(4— b), (19.20) 
which simplifies to 
0.4a — .0.2b = 1, —0.2a + 0.4b = 0.8, 
and this is solved to give 
a= 14/3, b= 13/3. 
Consider a variation on the above where a level premium of z is paid as long as the person 


is healthy. We now need a z on the right hand side of the first equation in (19.20) and solving 
will lead to 


14 — 10x 13 — 5x 
a= b= ——. 
3 3 


Interestingly enough, we can now calculate the equivalence principle premium. In view of 
the stationarity, this must be the value of z which makes a = 0. This will be z = 7/5 and the 
corresponding reserve for the unhealthy lives will be 2. The value of 7/5 can also be verified 
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directly by dividing the benefit present value of 14/3 by the value of an annuity of 1 per year 
while the life is healthy. This is calculated as in Example 19.19 by summing the appropriate 
geometric progression to give (1/2)(1/.2 + 1/.6) = 10/3. The disabled state reserve of 2 can 
always be verified since 2 = 0.6(2) + 0.2(4), showing that 2 will provide for the same reserve 
for those who remain disabled and a death benefit of 4 for those who die. 

In the continuous case, we have Thiele’s differential equations for the multi-state reserves. 
Consider a contract providing benefits, including accompanying expense, of pe at time t 
for transfer from state i to state j. Net payments are are collected condntionsly, where the 
periodic rate of payment at time ft is a when the system is in state i. We then have a system 
of differential equations, one for each state i, as follows. 


1, V = à V9 +a- Y p; o (0 bË 4 vO Vv) 
jfi 


which we can also write as 


EVO = 8, VO + x E2 ui CE ii +,V) (19.21) 
by invoking Theorem 19.1. 


19.5 Profit testing in multi-state models 


We now revisit Section 12.4.3 and adapt it to the multi-state case. As we did in that section, we 
will include notation for all the expenses, rather than incorporating them into other symbols. 
Suppose we have a Bene uenraice contract based on a discrete time multi-state model 
providing for benefits of pi ) at time k + 1 for transfer from state i to state j between time 
k and k+ 1 and annuity benefits of c, o paid at time k when in state i. We have a profit test 
basis at time k consisting of premiums sa ) , percentage of premium expenses of p 2 periodic 
expenses of e®, all paid at time k when’ in state 7, and in addition a transition matrix P}, 
interest rates i p and transfer expenses of e) D paid at time k + 1 for a transfer from state i to 
state j between time k and time k + 1. The profit testing procedure will conform precisely to 
the principles outlined in the single-life case. The profit for the period running from time k 
to k + 1, for a policy in state i at time k is given by 


PD = (VO +a - P -eP)aeip- Xon [e «a? + pV 
" 


Note that this is just a straightforward generalization of the formula (12.4b) with the 
modification that we must provide reserves for the states transferred into, as noted in the 
Section 19.4. 


Example 19.16 — An insurance policy issued to a healthy life provides for death benefits 
at the end of the year of death of 100000 if death occurs while healthy or 70000 if death 
occurs while unhealthy. In addition there are yearly payments of 30 000 provided the insured 
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is unhealthy. You are given reserves as follows, with state 0 being healthy and state 1 being 
unhealthy. 


5V = 25000, V =75000, V 230000 V? = 60000 


The profit test basis has annual premium of 15,000 paid while healthy, an interest rate 
of 10%, periodic expenses paid each year of 200 if healthy or 400 if unhealthy, a transition 
matrix for the year running from time 5 to time 6 as the matrix Py of Example 19.1, and 
expenses for paying a death claim of 1% of the face amount. Find the profit Prg both when 
the policyholder is healthy at the end of 5 years and when the policy holder is unhealthy at 
the end of 5 years. 


Solution. 


pro? = (25 000 + 15 000 — 200)(1.1) — 0.7(30 000) — 0.2(60 000) — 0.1(100 000(1.01) = 680. 
pi? = (75 000 — 30 000 — 400)(1.1) — 0.2(30 000) — 0.6(60 000) — 0.2(70 000(1.01) = —7080 


19.6 Semi-Markov models 


In this section, we discuss briefly the problem that arises when the Markov condition does 
not hold. This is a frequent occurrence, since this will happen whenever the probabilities of 
movement from one state to another depend on the length of time elapsed since entering the 
current state. These are known as semi-Markov models. A typical example is found in the 
healthy-unhealthy-deceased model. The likelihood of either recovery or death for a unhealthy 
person will certainly depend on how long they have been in the unhealthy state, a fact which 
was not considered in our previous models. 

We will not discuss these in detail. One method of approach is to approximate the semi- 
Markov model by a Markov chain, by dividing states up into several sub-states. For example, 
in place of a single state h for being unhealthy we could have states h,, hy, ... , hy where the 
only transitions with positive probability between this collection of states is from h; to state 
hj,,. Therefore, a higher subscript is indicative of a longer period of being unhealthy. The 
probabilities of transition to both recovery and death could then differ between the sub-states. 


Notes and references 


For a more detailed account of multi-state theory than we have presented here, see Norberg 
(2008). 

Transition probabilities, which involve four variables, can be written in a myriad of 
Ways, and various choices of notation are found in different works. In much of the actuarial 
literature, the standard form of notation is maintained. The basic symbol used is ,p? to refer 
to the probability that a life now age x and in state i will be in state j at time f. In the notation 
of this chapter, we would consider the age (x) fixed and write the above probability as p;;(0, f), 
following the typical usage found in much of the probability theory literature. The actuarial 
notation has the advantage of following classical actuarial usage for probabilities, but seems 
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restricted to models which refer to a body of lives, and not directly applicable to more general 
situations. 
See Jones (1997) for an interesting application of multi-state models. 


Exercises 


19.1 


19.2 


19.3 


19.4 


Redo Example 19.2, but assume now that in the first year, the probability that a healthy 
person will become ill is 0.3 rather than 0.2, and the probability that they will remain 
healthy is 0.6 rather than 0.7. 


Consider a Markov chain with transition matrices for the first three time periods 
given by 


1/3 1/3 1/3 1/2 1/2 0 1/2 1/4 1/4 
Py=|1/3 1/3 1/3], P,=| 0 1/2 1/⁄2|, BR =ļ|1/2 1/4 1/4ļ. 
1/3 1/3 1/3 1/2 1/2 0 1/2 1/4 1/4 


Interest rates are given by ig = 0.05, i, = 0.06, i, = 0.07. Find the APV of each of 
the following contracts. 


(a) At time 0, the process is in state 0. A contract provides for payments at the end of 
the period of a transfer from state 0 to state 1 if this occurs within three periods. 
The payment is 100 in the first year, 200 in the second year, and 300 in the third 
year. Note that more than one payment can be made. 


(b) At time 0 the process is in state 0. A contract provides for a payment of 1000 at 
time k, k = 0, 1,2,3, provided the process is in state 0 at time k. 


A transition matrix for a three-state homogeneous Markov chain is given by 


0.4 03 0.3 
P-[07 02 Ol 
0.2 0.7 0.1 


A process starts in state 0. Annuity contract 1 provides for periodic payments, provided 
the process is in state 1. Annuity contract 2 provides for periodic payments which 
starts at the end of the period in which process enters state 2, but stop completely 
upon exit from state 2 and do not begin again, even upon subsequent return. For each 
contract, find the probability that a payment is made at time 3. 


A transition matrix for a four-state homogeneous Markov chain is given by 


0.1 02 03 0.4 
0.3 0.1 03 03 
05 02 01 02J[ 
0.4 03 02 041 


P- 
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19.5 


19.6 


19.7 
19.8 


19.9 


19.10 
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At time 0, the process is in state 0. A contract provides for payments at the end of the 
year of the first transition to state 3 from any other state, provide this occurs within 
4 years. The amount of the payment is 100 if the transition is from state 0, 200 if 
the transition is from state 1, or 300 if the transition is from state 2. Level net annual 
premiums are paid beginning at time 0, and continuing until the time of transition to 
state 3. The interest rate is a constant 5%. 


(a) Find the annual premium. 


(b) Find the reserve at time 2, assuming the process is at that time in (i) state 0, (ii) 
state 1, (iii) state 2. 


Verify that the Kolmogorov forward equations with s = 0 give Equation (8.15) in the 
case of a single life, or Equation (11.10) in the multiple decrement model. 


Derive the Kolmogorov backwards equations. For times s < t, 


N 


ð 
ER D = — 9 Hi), t). 
s kl 


Redo Examples 19.8 and 19.12 now given that ug, = 1, ug? = 3, M10 = 2, uj» = 3. 


For a two-state chain, derive the matrix (19.15) directly by using eigenvalues and 
eigenvectors. 


In a two-state model, the forces of transition are given by 49;(0, 1) = 1 for all t while 


1 ifOxt«l, 
M008) = 6 ifl1«r. 


Find the probability that the process will be in state 0 at time 2.4 given that it is in 
state 1 at time 1.4. 


A stationary Markov chain has the following transition matrix. 


0.6 0.1 03 
P-[02 0. 03]. 
0.1 0.3 0.6 


Interest is a constant 5%. Find the APV of the following contracts based on this chain, 
both issued when the process is in state 0. 


(a) An temporary annuity provides 100 per year, beginning when the process enters 
state 1 and stopping when the process leaves state 1. Payments do not resume 
upon re-entry to state 1. The last possible payment is at the end of 4 years. 


(b) A 7-year term insurance pays 100 at the end of the year of the first transfer from 
state 0 to state 2, provided that that prior to this time there was a transfer from 
state 1 to state O. If a transfer from state 0 to state 2 occurs before a transfer 
from state 1 to state 0, then nothing is paid. 


19.11 


19.12 


19.13 


19.14 


19.15 
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Consider the chain of Figure 19.1. Suppose that (x) and y are two independent lives, 
where x is subject to a constant force of mortality a and (y) is subject to a constant 
force of mortality b. 


(a) Write down the reduced intensity matrix M', (as defined in Example 19.8). 
(b) Write down the reduced transition matrix P'(f) directly from Chapter 10 formulas. 
(c) Find the eigenvalues and eigenvectors of M". 


(d) Verify that the rows of P’(f) are linear combinations of vectors of the form 
e^ (ai, a5, az) where A is an eigenvalue of M” and (a), a5, a3) is the corresponding 
eigenvector. 


Consider the chain of Figure 19.1. We do not necessarily assume independence. You 
are given the constant intensities uo; = b, ug? = 4, H13 = C, u53 = d. 


(a) Given any positive u, v, find the probability that (x) survives u years and (y) 
survives v years, in terms of a, b, c, d. 


(b) Show that T(x) and T(y) are independent if and only a = c and b = d. 


Consider a multi-state model for the three lives (x), (y), (z) where we have eight states 
described below, where the stipulated lives are those still living, 

State 0: all; State 1: (x), (y) only; State 2: (x), (z) only; State 3: (y), (z) only; State 4: 
(x) only; State 5: (y) only; State 6: (z) only; State 7: None. 

You are given that uo; = 0.02, Wo. = 0.01, uo; = 0.04, u34 = 0.02, u35 = 0.06, 
Hag = 0.05, ug; = 0.10 and the force of interest is a constant 0.05. 


(a) Find the actuarial present value of an insurance policy on the three lives which 
pays 1 at the moment of death of (z) provided that (x) dies first and (y) dies second. 


(b) Find the probability that the lives die in the order (x), (y), (z). 


A contract is based on a three-state chain. There are benefits at the end of the year of 
transfer, of 100 for transfer from state O to state 2, and 50 for transfer from state 1 
to state 2. Annual premiums of 10 are paid while the process is in state 0. There are 
annuity payments of 5 made each year when the process is in state 1. You are given 
that i; = 0.10 and the transition matrix 


0.6 0.1 03 
P;-2[02 0.7 OI]. 
02 0 08 


If 
,VO 250, ¿V® =35 ,vO =70, 
find 6V®, for i = 0, 1,2. 


Redo Example 19.16 with the following changes in the given data: 
(i) The transition matrix for the year running from time 5 to time 6 is now given 
by the matrix P, of Example 19.1. 


19.16 


19.17 


19.18 
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(ii) There is a payment at the end of the year of 20 000 when a healthy life becomes 
unhealthy, and the yearly income for unhealthy lives is reduced to 15 000. 


A insurance on two independent lives (x) and (y) provides for a death benefit at the 
moment of the second death. The death benefit is e°.°*’ for death at time t. Premiums 
are payable until the second death, The premium is level while both are alive and 
reduces to one half of the initial amount upon the first death. The life (x) is subject to 
a constant force of mortality of 0.10 and (y) is subject to a constant force of mortality 
of 0.15. The force of interest is a constant 0.10. Refer to the two-life model as given 
Figure 19.1. 


(a) Using the method in Chapter 10, calculate for each of the first three states the 
annual premium and the reserve at time f. 


(b) Write down the Thiele differential equations. 

(c) Verify that your answers to (a) satisfy the equations in (b). 

In a certain four state chain, the forces of transition are given by 

Hoi (f) = 0.140.022, py, =0.24+0.01t, uo3() 2 0.140.032, 4 = 0.2 + 0.022. 
For all other i, j with i £ j, Hit) — 0. The force of interest is a constant 0.06. 


The following double integral represents the net single premium for a certain 
insurance contract. 


3 t 
1 J e™0-6+0.091+0.02)(0.1 + 0.025)(0.2 + 0.010ds dt. 
0 0 


Describe the benefits on the contract. 


Prove Theorem 19.4 by observing first that 


a(S, t + h) = pas, t) pult, t + h). 


20 


Introduction to the Mathematics 
of Financial Markets 


20.1 Introduction 


This chapter will introduce some basic concepts of modern mathematical finance. One goal 
is to cover the fundamentals of option pricing. This has become an important tool in actu- 
arial mathematics, since many insurance and annuity contracts today contain the so-called 
“embedded options’, which we discussed in Section 13.2. For the most part, we carry this 
out in a discrete setting, but we do move into the continuous-time approach briefly in order 
to introduce the Black-Scholes-Merton formula. Another major objective in this chapter is 
to revisit the basic quantity of a discount function which we introduced early on. In the first 
part of the book, we treated this as a deterministic function, but a more realistic approach 
would be to consider v(s, t) as a random variable, reflecting the stochastic nature of investment 
returns that we discussed in Chapter 14. In particular, we seek a version of the key identity, 
Formula (2.1) in this stochastic setting. 

A prerequisite for this chapter is the starred Section 2.12. We assume familiarity with 
concepts discussed in that section such as as short selling, forward contracts, and arbitrage. 


20.2 Modelling prices in financial markets 


A financial market is an institution designed to facilitate the trading of financial assets, such 
as stocks or bonds. Certain individuals wish to buy such assets, while others wish to sell them, 
and the financial market provides a forum to bring together the various parties. The potential 
holder of an asset quotes a desired selling price, known as the asking price, while the potential 
buyer quotes a desired buying price, known as the bidding price. When bidding and asking 
prices are equal it establishes a price for the asset, and a sale can be made. Our goal in this 
section is to develop a stochastic model for the evolution of prices. 
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For most of this chapter, we assume a relatively simple framework, which will allow us 
to present the main ideas without too many technical mathematical difficulties. We assume a 
discrete-time model. That is, trading of assets will take place at integer times 0, 1,2, .... The 
time period can be arbitrary, and we can think of it as possibly a very short time, (say an hour or 
even a minute) which could then constitute an approximation to the more realistic continuous- 
time setting. We adopt a finite-time horizon, with time N as the last date we are interested in. 
Finally, we assume that the price of any asset can take on only finitely many possible values. 

Suppose we have M + | assets traded in our market, numbered from 0 to M. We will let 


Sj(n) = the price of the j-th asset at time n. 


We consider each such price as a random variable. Therefore, our financial market is modelled 
by M + 1 discrete time stochastic processes Si(n), where j = 0,1,2,...M, and n = 0,1, ... N. 
We will single out a particular asset, often referred to as a bank account, for asset numbered 
0. To describe this, we will first need to postulate in our model a nonnegative quantity r called 
the risk-free rate of interest, which is the interest rate that we can obtain on a risk-free 
investment as described in Section 2.12. 
We can then define this asset by 


S)0)=1, Spon) = (+r. 


In other words, this is an asset which accumulates at the risk-free interest rate. It is a stochastic 
process in which each random variable takes a single value with probability 1. For simplicity, 
we are at first adopting a constant risk-free interest rate. The definition could be based on a 
more general discount function and we comment on this below in Section 20.13. 

We will make the same idealized assumptions that we made in Section 2.12. That is, we 
postulate that for each asset, any real number of the units can be bought at any trading date. 
Through short selling if necessary, this includes negative quantities. We also assume that there 
are no transaction costs such as commissions. 

Note that the existence of the bank account means that we are assuming that all participants 
in our market can freely borrow at the risk-free rate. 

Another simplifying assumption made throughout is that none of our assets provide any 
payments at intermediate dates, such as dividends on stocks or coupons on bonds. They 
provide funds only upon sale or maturity. 


20.3 Arbitrage 


An initial observation is that in the typical financial market, the various asset prices do not 
move independently. If asset i moves up in price, asset j may have a tendency to move up, 
or possibly to move down, or be certain to move up or down. It can be quite complicated 
to model all dependencies, but the no-arbitrage principle will often enable us to reduce the 
possibilities. In our stochastic models, this requires a more complicated definition than the 
one we gave in Section 2.12. 

We first define the concept of a trading strategy. This is roughly a description you, as an 
investor in the market, would give to an assistant before leaving for a holiday on a remote 
desert island where you cannot be reached. You would specify the number of units of each 
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asset to be held at each time period. At each trading date after the initial portfolio is established 
at time 0, certain assets in the existing portfolio would be sold and others bought to achieve 
the stipulated amounts. These amounts could depend on the entire past and present history. 
The description could be enormously complicated or quite simple. For example, a trading 
strategy might be as follows: 

Start with an initial portfolio of 1 unit each of asset 1 and asset 2. Keep this intact until 
the first time that the price of asset 1 is above 40 per unit and the price of asset 2 is below 30 
per unit. At that time, sell all units of asset 1 and use the proceeds to buy shares of asset 2. 
These are then held without further trading. 

Note that the amounts to be held at time n can depend on all the prices of all assets at 
or before time n, but not after. It would not be a feasible trading strategy to specify that a a 
certain asset should be sold at time 2, if the price of some other asset were below 40 at time 3. 

To formalize this somewhat, we can represent the asset holdings at any time n by a vector 


a(n) = (a(n), a, (n) ... ay(n)) 


where a; j(n) à is the number of units of asset j held at time n. 
The Hie of this vector are random, depending on the prices up to time n. So a trading 
strategy 7 is formally a vector of these random vectors. 


= (a(0), a(1) ... a(N — 1), 


where each a(r) is a function of the values of S; i(k) forj = 0,1,...M, k= 0,1,2.. 

For any trading strategy and any time n, we will have a portfolio consisting oe a certain 
number of units of each of our M + 1 assets. The portfolio at time n will then then have a 
value V(n) obtained by multiplying the number of units of each asset by the price of that asset 
at time n, and summing. That is 


M 
V(n) = M aS), 
j=0 


a random variable depending on all prices as well as the trading strategy as followed up to 
time n. Of course V(0) is a definite number as it is the cost of setting up the initial portfolio 
at time 0 when all prices are known. 

For any trading strategy, there is a reverse strategy which involves holding at each time, 
the negative of the number of units held in the original strategy. In other words, one sells in 
place of buying and buys in place of selling. Formally, if a trading strategy is given by the the 
vector 7, the reverse strategy is given by —7 . If V* denotes values for the reverse strategy it 
is clear that V*(n) = —V(n) for all n. 

Here is another important concept. 


Definition 20.1 A trading strategy is said to be self-financing if for any trading date after time 
0 and before time N, the total price of all the assets sold on a given trading date exactly equals 
the total price of all the assets bought on that date, so no additional infusion or withdrawal of 
capital is required. 


For a self-financing strategy, the value of the portfolio at any intermediate trading date is 
the same before and after trading. 


336 INTRODUCTION TO THE MATHEMATICS OF FINANCIAL MARKETS 


We can now summarize the procedure we will be following in subsequent discussions. We 
set up an initial portfolio at time O for a cost of V(0), dictate a self-financing trading strategy, 
retreat to the dessert island, where no additional outlays of cash are required and none are 
received. Finally, the portfolio is liquidated at time N for proceeds of V(N), a random variable 
which depends on both the trading strategy and the evolution of prices. We let P denote the 
probability measure for V(N). 

The key definition of this section can now be given in terms of the starting value V(0) and 
the ending value V(N). 


Definition 20.2 The financial market admits arbitrage if there exists a self-financing trading 
strategy such that 


V(0)20, V(V) 20, and P[V(N) > 0] » O. 
A financial market which does not admit arbitrage is said to be arbitrage-free. 


In other words, an arbitrage opportunity is one where starting with a zero investment, we 
cannot possibly lose by the end of the trading period, and we have at least some chance of 
making a gain. Note that the arbitrage opportunity does not guarantee a positive gain. One can 
think of it as being given a lottery ticket for free. We cannot lose anything, and there is some 
chance of profiting. It is important to note that cases where there is a very small probability 
of loss do not constitute an arbitrage under this definition. The avoidance of the loss must be 
absolutely certain. 


Remark In the definition of arbitrage, we could replace the condition on V(N) by V(N) € 
0 and P[V(N) < 0] > 0, since, if this holds, the reverse strategy will satisfy the original 
condition. This looks a bit strange at first, but it simply says that if there is a strategy for which 
we are sure not to gain, then the reverse strategy is sure not to lose. 


An important consequence of the above is the following. 


Theorem 20.1 In an arbitrage-free financial market, if there is a self-financing trading 
strategy for which V(n) is a constant c for some n, then 


c 2 VO 4- ry". 


Proof. Modify the strategy by holding —V(0) units of the bank account at time 0, so that 
the new strategy has initial value 0. If necessary, modify the strategy further to stipulate that 
everything should be settled at time n, and the proceeds (possibly negative) left to accumulate 
in the bank account at the risk-free rate until time N. The new strategy will have the constant 
value of c — V(0)(1 + r)” at time n, and this must be equal to 0. If not, there would be a 
sure chance of having either a positive or negative amount at time N, which would imply an 
arbitrage opportunity, by Definition 20.2 and the remark following this definition. Oo 


This was a reasonably simple result, but there is an important message behind it. It says that 
in the absence of arbitrage, if we can find a self-financing trading strategy which eliminates risk 
at some point, then our initial investment must accumulate at the risk-free rate up to that point. 
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Remark Itis true that any trading strategy can be converted into a self-financing one by 
using the bank account. An excess of the sales over purchases can be placed in the bank 
account, while excesses of purchases over sales can be handled by borrowing. However this 
gives a different strategy with a different amount held in the bank account at time N, and 
therefore a different value of V(N). The self-financing hypothesis is therefore essential in the 
definition of arbitrage. 


For our first example, we consider a very simple financial market. We will take N = 1, so 
a trading strategy involves simply specifying the initial portfolio. Our financial market has, in 
addition to Sp, a single risky asset S, consisting of a stock. We can assume, changing units if 
necessary, that the price of a unit of the stock is 1 at time 0. Suppose that the price of the stock 
at time 1 can only take two possible values, u or d (standing for ‘up’, ‘down’ respectively) 
with d « u, each with positive probability. We call this a binomial model to reflect the two 
possible values at time 1. 


Theorem 20.2 The above financial market is arbitrage free if and only if 


d « (Y r) <u. (20.1) 


Proof. Consider any trading strategy with V(0) = 0. If a is the number of units of stock in 
the initial portfolio. we must have —a units of the bank account. Then we will have either 
V(1) = au — a(l + r) or V(1) = ad — a(1 + r). Suppose (20.1) holds. If æ < 0 then the first 
such value will be negative and the second will be positive, while the reverse holds if a > 0. 
If a = 0, both values are 0. An arbitrage opportunity cannot exist. 

Conversely if (20.1) is not true, then at least one of two possibilities holds. Suppose 
d > (1 +r). We create an arbitrage opportunity by choosing a > 0 which makes both values 
of V(1) nonnegative,with at least one positive. The other possibility is that u € (1 4 r), in 
which case we similarly create an arbitrage opportunity by taking a « 0. 


Note that the converse statement is intuitively obvious. If the inequality does not hold, 
then we can create an arbitrage opportunity by either buying a stock which is sure to yield 
more than the risk-free return, or short selling a stock which is sure to yield less than the 
risk-free return. 

Another pertinent fact to notice in the definition of arbitrage is that the condition does not 
depend on the particular values of P but only on whether such values are positive or zero. We 
are therefore led to make use of the following standard definition of probability theory. 


Definition 20.3 Two probability measures P and Q on a sample space S are said to be 
equivalent if for all A C S, we have P(A) = 0 if and only if Q(A) = 0. (For readers familiar 
with the concept of an equivalence relation, one can readily verify that this is a legitimate 
such relation.) 

It is clear from the definition that a financial market is arbitrage-free with respect to P if 
and only if it is arbitrage-free with respect to any equivalent probability measure Q. 


20.4 Option contracts 


Given a financial market, we can do more than just buy or sell the existing assets. We have 
already seen one possibility, which is to enter into forward contracts. Another possibility 
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is option contracts. These are in one sense similar to forward contracts since they are both 
transactions which involve trading of assets at a future date for prices that are specified now. 
There are major differences however, for in an option, unlike the forward contract, one party 
is not obligated to complete the transaction, but has the option to do so, and will only exercise 
this option if it is advantageous. There are two basic types of option contracts, known as calls 
and puts. A buyer of a call option has the right to buy a specified asset at a specified future 
time, known as the expiration date or exercise date, for a specified price, known as the strike 
price or exercise price, if they should choose to do so. The call option buyer has a similar 
motivation to a speculator taking a long position in a forward contract. They hope for a rise 
in price, so that they can buy the asset at a price which is lower than prevailing at the time of 
purchase. If the price of the asset at the expiration date is below the strike price, the option 
will not be exercised. A buyer of a put option has the right to sell a specified asset on the 
expiration date for a specified strike price. The put option buyer has a similar motivation to 
the speculator taking a short position in a forward contract. They hope for a fall in price so 
that they can sell the asset for more than it is worth at the time of sale. In this case, if the price 
of the asset at the expiration date is above the strike price, the option will not be exercised. 
Unlike the forward contract, the call and put buyers are not on opposite sides. For each of 
them, there must be another party who sells or (as it is commonly said) writes the option, 
and agrees to complete the transaction should the option holder so elect. Now if the option is 
exercised, the option writer is necessarily selling or buying at an unfavourable price, and they 
are compensated for this by the option price which they receive from the option holder at the 
time the agreement is entered into. The option writers of course hope that options will not be 
exercised, so they profit by the full amount of the option price, and do not have to engage in 
an unfavourable transaction. Determining option prices is complicated, and will form much 
of the material of this chapter. 

It should be noted that what we have described are more properly known as European 
options, which specify that the option can only be exercised on the one specific expiration 
date. We will assume all options we discuss are of this nature unless specified otherwise. 
Another type of contract, known as an American option, allows for the exercise of the option 
at any time before or on the expiration date. These are more complicated and will be dealt 
with briefly in Section 20.7. 

Although the underlying assets for calls and put are normally taken as financial instruments 
like stocks, as will be the case in our treatment, the basic idea of an option arises in many 
diverse contexts. For example, buying insurance on an asset like a house, is essentially buying 
a type of put option. You are protecting yourself from a drop in value, not from market 
variation in this case, but rather from physical damage. Similarly, the guarantees for variables 
annuities (as discussed in Section 13.2) which protect your account against unfavourable 
investment experience constitute put options. For another example, suppose that you take out 
a long-term loan or mortgage, and the lender gives you the right to repay in full at any time 
without penalty. In effect, you have been given a call option. In this case, you are protected 
from a rise in the cost of repayment, which will occur if interest rates decline. (Refer to the 
discussion in Section 2.10.3.) 

In essence, protection against declines in the value of an asset that you own, while allowing 
you the full benefit of increases in value, can be viewed as being given a put option. Protection 
against increases in the value of an asset that you may wish to acquire in the future, while 
allowing you the full benefit of decreases in value, can be viewed as being given a call option. 
Note that in contrast, forward contracts protect you from unfavourable declines or increases, 
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but they do not allow the parties to reap the full upside benefits, since the transactions must 
be completed with the agreed upon prices. 


20.5 Option prices in the one-period binomial model 


In this section, we show how the no-arbitrage principle allows us to calculate an option 
price for the one-period binomial market where condition (20.1) holds. We illustrate with a 
particular example. Suppose that a share of the stock is selling for 108 at time 0 and at time | it 
will be either 132 or 99, each with positive probability. Assume a risk-free interest rate of 10%. 

Consider a call option on the stock with an expiration date of time 1 and a strike price 
of 110. What should the price per unit of this option be? At first, one may think that there is 
no way to determine this exactly, and that it could take on many possible values. After all, 
the option is just another asset with its price being determined by the amounts bid and asked 
by the various market participants. The worth of this asset, however, is directly tied to the 
performance of the stock, so it should be clear that its price must be related in some way to 
the stock price. Such an asset is often termed a derivative security, since its value is derived 
from that of another security. 

To help determine the price, we take the following point of view. Purchasers of call options 
are not normally interested in actually taking possession of the stock at maturity. They simply 
want to buy it at the strike price, and sell it immediately for the higher market price if available. 
If the market price is below the strike price, the option is worthless and they receive nothing. 
The option then is just another asset S, with S5(1) = 132 — 110 = 22 if the stock price goes 
up, or S,(1) = 0 if the stock price goes down. The problem is to determine $5(0). 

Those well-versed in the actuarial models we discussed in earlier chapters may well think 
that we can determine S,(0) by simply taking a discounted expected value, as we did with 
several other similar sounding problems. That is, we simply take the price as 22vp where v is 
the discount factor for one period, and p is the probability that the stock goes up. We will first 
illustrate why one cannot solve the problem this way, and after that, we will, paradoxically, 
illustrate why one can do it this way. 

The first problem is that one is not given p as part of the model. AII that we postulated 
about the probability measure P was that both of the possible outcomes at time 1 have positive 
probability. Indeed, there may not be any reasonable choice for a single value of p. The many 
different participants in the market may well have completely different assessments of this 
figure. It is not unusual to find two experts commenting on a particular stock, where one 
claims it is the best buying opportunity to come along in the last decade, and the other predicts 
imminent bankruptcy of the firm. 

The second problem is that one is not given v. Now the reader may take issue with this 
statement since we postulated a risk free rate of 1096 a few paragraphs back, so it appears as if v 
is simply (1.10)-!. Use of this rate would imply that the buyer is looking for an expected return 
of 10% on their investment. However, 10% is the return for a perfectly risk-free investment. 
Investing in a call option is far from being risk-free. If the stock price at expiry is below the 
strike price, the entire investment is lost. It is to be expected that a rational option purchaser 
will want a return in excess of 1096 as compensation for taking on the risk. (Recall that we 
discussed the same concept when introducing the risk discount rate in Section 12.4). 

We will now solve the puzzle, and show that regardless of the assessment of p or of the 
desired yields of different individuals, the price of this option can only be 12. The reason is 
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that one can in fact replicate the option for an initial investment of 12. That is, by investing 
only in the bank account and the stock, one can produce an outcome at the expiration date, 
which matches exactly the payouts of the option. This is done by buying 2/3 of a share of 
stock at time 0, which will cost 72. We can put in 12 cash, and borrow the additional 60. If 
the stock price is 132 per share at time 1, we sell our 2/3 of a share for 88, pay off the loan 
balance which is now 66, leaving us with 22. If the stock price is 99, we sell our 2/3 of a 
share for 66, and pay off the loan, leaving us with nothing extra. We have therefore exactly 
replicated the option for the price of 12. It is is clear that no one would pay more than 12 
to buy this option. Similarly, nobody would sell the option for a price of less than 12, since 
instead they could reverse the above strategy and be in the same position at time | as if they 
had written the option, but they would have have received 12 at time 0. 

Here is another point of view, which ties in with our previous definition of arbitrage. 
If we enlarge our financial market by adding the option as another asset S,, then we must 
take S, = 12 to make this enlarged market arbitrage-free. To take a definite example, suppose 
the option price is 13. We will construct an arbitrage opportunity. Take the trading strategy 
which has as initial portfolio a(0) = (—59,2/3, —1). The reader can verify that V(0) = 0. 
Now Sp)(1) = —64.90, so If S, (1) = 132, then $5(1) = 22 and V(1) = 1.1. If S, (1) = 99 then 
S,(1) = 0 and again V(1) = 1.1. We leave it to the reader to find an arbitrage opportunity if 
the option price is below 12. 

Let us now go back to the proposed solution of of 22vp as an option price, which we 
criticized a few paragraphs above. If we in fact use the risk-free rate and therefore take 
v = 1.107!, we will get the correct answer by using p = 0.6. Is there someway we could 
have discovered this probability of 0.6 beforehand? The answer is yes. Let us suppose that 
there exists a so called risk-neutral individual, that is one who ignores the risk and is happy to 
accept an expected 10% return on any investment, regardless of the degree of safety involved. 
Let p be the particular probability of rise in the stock price, which would be assumed by such 
a risk-neutral person. In order that this person would be willing to pay 108 for a share of 
stock, we should have that 


108 = 1.107! [132p + 99(1 — p)] 


and solving we have indeed that p — 0.6. 

We have now discovered the important principle of risk-neutral valuation. The assignment 
here of 0.6 and 0.4 to the events of the stock going up or down, respectively, is known as a 
risk-neutral probability measure. It is the probability that must be assigned by a risk-neutral 
individual in order to justify buying the stock at the market price. Note that we are not saying 
that such a person necessarily exists, and indeed have stressed that most investors would 
be unlikely to possess such an attitude. We are only saying that if one did exist, the price 
of the underlying asset would necessarily imply a unique probability assessment for that 
individual. The principle then says that if we use the risk-free interest rate, along with the 
risk-neutral probability measure, then we can indeed value options by following the usual 
actuarial approach of taking a discounted expected value. 

Note carefully that the risk-neutral probability measure need not be the same or indeed 
have any particular relation to the original measure P, other than being equivalent in the sense 
defined above. Even if we had specified values for P, these would have had no effect on the 
resulting option price. The fact that one can risklessly replicate the option means that only the 
risk-neutral probability and the risk-free interest rate need be considered. 
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This is a puzzling observation at first, and for those who are still skeptical, we will look 
into the situation a little further. We mentioned above that the probability p was not even 
specified as part of the model, but let us suppose it is. In fact, suppose that instead of a stock 
with uncertain returns, we have two lotteries each depending on the same random draw. A 
ball is drawn randomly from an urn containing two white balls and one red. The payoff from 
lottery 1 at time 1 is 132 if a white ball is drawn, or 99 if a red is picked. The payoff from 
lottery 2 at time 1 is 22 if a white ball is drawn or 0 if a red is drawn. So the true underlying 
value of p is now indisputable as 2/3. If the price for a lottery 1 ticket is 108, and we make the 
assumption that we can buy or sell any fraction of lottery 1 tickets, then the price for a lottery 
2 ticket must be 12, by exactly the same argument as given above, regardless of the known 
value of p. What does this imply for people who participate? Buyers of a ticket in lottery 1 are 
in effect earning an expected return of [(2/3)132 + (1/3)99)]/108 — 1 = 12.04%. There is a 
reasonable extra return over the risk-free rate, to compensate for the risk taken on. Buyers of 
a ticket in lottery 2 are in effect earning an expected return of [(2/3)22/12] — 1 2 22.2266, a 
much higher return, which compensates for the greater risk in lottery 2 when the entire stake 
could be lost. Indeed for any value of p above 0.6, there will be a return above the risk-free rate 
in lottery 1 and an even higher return in lottery 2. It is only for the risk-neutral value of p equal 
to 0.6, for which the expected returns on both lotteries will coincide with the risk-free rate. 

Going back to our original example with the stock, is it possible that an investor who 
assesses the probability of an upward movement as being less than 0.6 would still pay 108 
per share, thereby earning an expected return of less than the risk-free rate? This may seem 
irrational, but it is no more so than the behaviour of a vast number of people who buy 
lottery tickets or gamble in casinos at highly unfavourable odds. (For more on this topic, see 
Example 22.2.) 

We next derive a general formula. Suppose that (20.1) holds. As we did above, we can set 
up an equation to solve for p, the risk-neutral probability that the upward move will occur. 
Taking S,(0) = 1, this is 


1 2 (0 4 n)! [put (1— pd], (20.2) 


which we solve to obtain 
| (o0 0-d 1 | u-(ltr) 


= 20.3 
u—d P u—d ( ) 


Note that condition (20.1) ensures that 0 « p « 1. 

The above procedure allows us to uniquely price, not only call options, but a general 
derivative security in this market, which pays an amount A if an upward move occurs or B 
if a downward move occurs. We do this in one of two ways: first, we can find a replicating 
initial portfolio consisting of « units of the stock and f units of the bank account by solving 
the equations 


aS, (0)u - B(1--r) 2 A,  aS,(0)d t B(1- r) 2 B. (20.4) 
Then 
Price = «5, (0) + f, (20.5) 


which is the cost of establishing the replicating portfolio. Secondly, and usually easier, we 
can bypass finding the replicating portfolio and just take the price as the discounted expected 
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value of the the payoff with respect to the risk-free interest rate and risk-neutral measure. 
That is, 


Price = (1+ r)^! [pA + (1 — p)B], (20.6) 


where p is as given in formula (20.3). The reader can verify that both methods lead to the 
same answer. 


Example 20.1 For the example given at the beginning of this section, find the price of a 
put option with a strike price of 110. 


Solution. If the upward move occurs, the holder tears up the option. If the downward move 
occurs, the holder buys the stock for 99, and sells it for 110. So this is a derivative security with 
A = 0, B = 11. Directly from Equation (20.6), we have that the price is 1.1071(0.4 x 11) = 4. 
Alternatively, solve (20.4) to derive the replicating portfolio given by a = —1/3, f = 40, and 
use (20.5) to get the same answer. To see this directly, we replicate the option for a cost of 4 
by selling 1/3 of a share short, receiving 36, letting the total of 40 accumulate to 44 at time 1. 
This allows one to just cover the short position if the stock is up, or cover the short position 
and have 11 left over if the stock is down. 


There is in this case yet another way to obtain the answer. In fact, we develop a general 
formula relating puts and calls. 


Theorem 20.3 (Put-call Parity) Let y denote the cost of a call option and z denote the 
cost of a put option on the same stock with a current price of S(0), the same strike price of 
K,and same expiration date N. Then 


S(0)--z —y 2 Kü +”. (20.7) 


Proof. Suppose an investor at time 0 adopts the following trading strategy. Buy one unit of 
stock, sell one call option, buy one put option, and hold these without further trading up to 
the expiration date N. If S(N) > K, the put will expire worthless, the call will be exercised by 
the other party, so that the investor must give up the stock for a price of K. If S(N) « K, the 
call will expire worthless, the investor will exercise the put and sell the stock for a price of K, 
while if S(N) — K, both options are worthless and the value is just the stock price. Whatever 
happens, the value at time N of the portfolio will be K. Since V(0) is just the left side of 
Equation (20.7), the formula follows from Theorem 20.1. 


The proof shows in fact that this theorem is true for a general arbitrage-free market, 
and does not depend on the binomial assumption. In our present example, we know y — 12, 
N = 1, S(O) = 108 and K = 110, and we can immediately calculate that z = 4. 


20.6 The multi-period binomial model 


The model of the last section is clearly too simple to be representative of reality. As a further 
extension, we keep the binomial feature, but allow the prices to evolve over several periods. 
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We assume that the price of the stock evolves each period as described in the one-period 
model above. That is, if the value is s at the beginning of a period, the price at the end will be 
either su or sd where d < u. So for an initial price of S(O) at time 0, the price at time 1 will be 
either S(0)u or S(0)d, and the price at time 2 will be either S(0)u, S(O)ud, or S(0)d?, etc. We 
can represent this by what is known as a binomial tree. See, for example, Figure 18.1 which 
is an example with u = 1.2, d = 0.8. 

We can now consider any general contingent claim, which will be a payoff at time NV 
which can depend on the entire history of up and down movements in the stock price. To 
formalize this, consider the sample space Q consisting of all paths in the binomial tree. Each 
such path can be labeled by an N-termed sequence formed of the entries U and D, where U 
denotes an upwards branch and D a downwards branch, so there are 2" paths altogether. We 
postulate that there is a probability measure P on Q, but we need not specify anything about 
it except that P(@) > 0 for all o € Q. 

We now formally define the general type of derivative security we are interested in. 


Definition 20.4 A contingent claim is a contract which provides a payment at time N which 
is dependent on the particular outcome in our underlying sample space. It is modelled by a 
random variable X, where for œw € Q, X(q@) is the payment for outcome o. 


For example, a call option with strike price K and expiration date N on a stock with current 
price S(0) is a contingent claim given by 


X(o) = [S(0yu" qd" — K]; 


whenever o is a sequence with m upward movements and N — m downward movements. (For 
any real number f, the symbol t, denotes max{t, 0}.) 

A contingent claim can be more complicated than the options we have described up to 
now. Consider, for example, a lookback option on a stock which will return at expiry the 
maximum value of the stock over the period from time 0 to time N. So looking for example 
at Figure 18.1, we would have 


X(DUD) = 96, X(UDD)- 120, 


and so on. 
The multi-period model has the same essential features that we observed in the one-period 
model. 


* The financial market consisting of the stock and the bank account is arbitrage-free if 
and only if condition (20.1) holds. 


Any contingent claim can be priced uniquely so as to prevent arbitrage. One method 
is to find a replicating self-financing trading strategy. The price of the claim is then 
the cost of setting up the initial portfolio for this strategy. A second way is to take the 
expected discounted value with respect to the risk-neutral probability measure Q on 
Q, which is simply the measure obtained by applying the appropriate probabilities p or 
1 — p as given by formula (20.3) to each branch of the binomial tree. That is, if c» has 
m entries of U and N — m entries of D, 


Q(o) = p"(1 — py” 
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These facts can be verified by using the results from the one-period model, and working 
backwards in time. The definition of contingent claim gives us directly its value at time N. 
We use those to determine the value and strategy applicable to each node at time N — 1, and 
then use these to get determine the value and strategy applicable to each node at time N — 2, 
and continue to iterate the procedure until we get to time 0. 


Example 20.2 Consider a two-period model where the price of a stock evolves as shown 
by the tree in Figure 18.1 up to time 2, and r = 0.10. The contingent claim X is a call option 
at time 2, with a strike price of 92. So 


X(UU) = 52, X(UD)=xX(DU)=4, X(DD)=0. 
Find the replicating strategy and the price of the option which will prevent arbitrage, 


Solution. Suppose at time 1, the value of the stock is 120. We know that u = 1.2,d = 0.8, and 
we can solve the system (20.4) with A = 52, B = 4 to get a = 1,8 = —920/11. This means 
that if the process is in the upper node at time 1, then in order to replicate the payoff at time 
2, we should own 1 unit of stock, and carry a debt of 920/11. The total value V(1) is 400/11. 

Similarly, if the value of stock is 80 at time 1, we solve the system (20.4) with A = 4, B = 0, 
and we arrive at a required portfolio of 1/8 units of stock and a debt of 80/11 for a total value 
V(1) of 30/11. 

We now move back to time 0 and again solve the system (20.4) with A = 400/11 and B = 
30/11, to obtain a initial portfolio consisting of 37/44 shares of stock and a debt of 7100/121. 
The value of this initial portfolio is V(0) = 3075/121 which must be the price of the option. 

To summarize, one can replicate this contingent claim by the following self-financing 
trading strategy. At time 0, buy 37/44 units of the stock, using 3075/121 of one's own capital 
and borrowing the remaining 7100/121 at the risk-free rate. At time 1, if the stock moves up, 
increase the stock holding to 1 unit, borrowing additional funds to do so. If the stock goes 
down, sell enough to reduce the stock holding to 1/8 unit, using the proceeds to partially 
repay the loan. 

This replicating strategy gives in addition a hedging strategy. Suppose you have just sold 
such an option. You run the risk that the stock will move up both periods. If you do not 
actually own the stock, you will be required to buy it at 144 and sell it at 92 (which shows 
the danger of selling a so-called naked option on a stock you do not own). If you follow the 
trading procedure outlined above, you will be sure to be able to meet your obligation in any 
event, assuming of course that the given model for the evolution of the stock price is correct. 

To calculate only the option price, rather than the complete replicating strategy, the second 
method can be used. That is 


option price = (1 + r)^" b xwa) l (20.8) 
oco 
For the particular case of a call option with strike K, this takes the form 
N 


option price = (1 + r)" b @ ) p" (1 — py" (s(oy" "7" — o») , (20.9) 


m=0 
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where p is determined by formula (20.3). In our case p — 3/4, and we can verify that, as 
before, the price is 

(1.1) ?[52(9/16) + 4(6/16) + 0(1/16)] = 3075/121 
Example 20.3 In the example given above, suppose the interest rate is 0. Find a price and 


self-financing trading strategy for the so-called lookback option, which pays at time N the 
maximum value of the stock at time 0, 1, 2. 


Solution. The payoff is 144 for the outcome UU, 120 for the outcome UD, and 100 for each 
of the outcomes DU and DD. We can find the price exactly as we did for the option above. 
For r = 0 we can calculate p = 1/2 and 


Price = [144(1/4) + 120(1/4) + 100(1/2)] = 116. 


To find the trading strategy, we need a more complicated diagram. See Figure 20.1. In the 
original diagram (Figure 18.1), the paths of UD and DU both led to the same position at time 
2. This is fine in cases where the prices of the stock at that point was all we were interested 
in, since this took the same value of 96 on both paths. However, in this case, the contingent 
claim is path-dependent and we need two different nodes to distinguish the two paths. 

For the upper node at time 1, we need to hold a units of stock and fj units of bank account 
where 


144a + f = 144, 96a+ f = 120, 
so that 
a=1/2 p=72, V(1)2 132. 
For the lower node at time 1 we similarly solve 


96a + P = 100, 64a + 6 = 100. 
so that 


a=0, p=100 V(1)- 100. 


144 


120 


m 


Figure 20.1 Example 20.3 


100 
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We are then back to a one period model with an asset that has value 132 if the upward 
movement occurs, or 100 if the downward movement occurs. So the initial portfolio must 
have a units of stock and f units of the bank account where 


120a + p = 132, 80a+f = 100 
so that 


a=0.8, f -—36, V(0)- 116. 


So the trading strategy is to start with 116 (as we knew from the first solution above), buy 
0.8 units of stock and put the rest in the bank account. If the upward move occurs, sell 0.3 
units of stock, or if the downward move occurs, sell all 0.8 units of stock, in each case putting 
the proceeds into the bank account. 


For contingent claims which depend only on final prices, the first type of diagram, (like 
Figure 18.1) known as a recombining tree, provides a significant reduction in computation. 
This was not readily apparent in our simple example where N = 2, but suppose instead that 
N = 10. The recombining tree would have 11 final nodes, while the more general version 
would have 2!? = 1024 final nodes. 


20.7 American options 


We include a brief discussion here on American options, which can have some surprising and 
initially puzzling features. Recall that such an option can be exercised at any trading date up 
to and including the final expiration date. One's intuition tells us that the price of this should 
be greater than the price of the corresponding European option, since there is more choice and 
therefore a chance for more potential gain. However one's intuition is not completely correct. 
In fact, given our assumptions, an American call option should never be exercised prior to the 
expiration date, so in fact the two options are equivalent and should bear the same price. If 
r = 0, the same phenomenon holds for an American put option. It is never correct to exercise 
early. However, in the more usual case when r » 0, it may well be correct to exercise the put 
option early. We now clarify these rather curious facts. 

Consider a particular example. You hold an American call option to buy an asset for a 
strike price of 100 and on a certain trading date n « N, the asset price is 300. Your desire to take 
advantage of this high price might induce you to exercise the option, making an immediate 
profit of 200. After all, at a later date the asset price could be lower, with a corresponding 
reduced gain. However, one should not exercise the option, since there a better way to take 
advantage of the higher price. You just sell the asset, relying on the option to protect you 
against further increases, the usual danger with short sales. By doing this and waiting, you 
would receive 300 immediately. In the worst case scenario, you then have to buy the asset at 
maturity for 100 and settle your short position. But your gain as of the expiry date would be 
200 plus the interest earned on the entire 300 that you received at time n, in addition to an 
extra gain if the price is below 100 on the expiry date. If you exercised the option early, your 
gain at expiry would be limited to 200 plus the interest earned on the 200 received at time n. 
Note that this conclusion depends heavily on our reasonable assumption that r > 0 and is not 
true for negative interest rates. 
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Our argument also does not apply to dividend-bearing stocks, since it is possible that by 
exercising early and receiving the stock, the dividends paid will more than compensate for the 
loss of interest. It does show however that the only possible times when one should exercise 
are those which coincide with dividend payments. Exercising before such a date will at least 
incur the loss of interest up to the dividend payment date. We will not go into the complete 
analysis in this case. 

Now consider an American put option. Suppose now that at time n the strike price is 300 
and the price of the asset is 100. Should one take advantage of the low price, by buying at 
100 (assuming you don not already own the asset), then exercising the put to sell at 300 and 
making an immediate profit of 200? If the interest rate is 0, the answer is no, because similarly 
to the call option case there is a better way to take advantage of the current low price. In this 
reversed situation, we simply borrow 100 and buy the asset, relying on the put to protect us 
again future drops in the price. At expiry, we sell the asset for a minimum of 300, repay the 
loan, and have a gain of at least 200. So again with an interest rate of 0, the American and 
European puts are equivalent. But consider the more realistic case of a positive interest rate. 
Our previous argument does not hold now, since by waiting, we are paying out interest rather 
than receiving it as in the case of a call option. Suppose that in any event, we decide to borrow 
100 at time n to buy the asset, and the interest charged over the period from n to N is 5%. If 
we exercise immediately, we receive 300, which increases with interest to 315 by expiry, and 
after repayment of the loan, our gain at time N is 210. If we wait to exercise, we will be better 
off if and only the asset price is higher than 315, in which case we keep the asset and tear up 
the option. So it is not immediately clear whether to exercise or not. 

In our discrete model, we can effectively work out the price and trading strategy for 
an American put option by the same backwards induction process that we illustrated in 
Examples 20.2 and 20.3. One simply must do an extra comparison at each node. Suppose we 
have calculated data for all nodes at times greater than n and we are considering a node at 
time n. One first works out the strategy and a temporary value V exactly as in the European 
case. One then compares that value with what could be obtained from immediate exercise at 
time n. This is calculated by buying (or selling) a sufficient quantity of the asset so that you 
hold one unit and then selling that unit for the strike price. If this exercise value is greater 
than V, then that replaces V as the value, and the strategy is to exercise at that node. 

The following is a simple one-period example, which is sufficient to illustrate the tech- 
nique, since, in all cases, you just follow the procedure below at each node. 


Example 20.4 An asset sells now for 100, and at time 1, will have a price of either 120 or 
80, both with positive probability. The risk-free interest rate is 0.10. Find as a function of K 
the price of an American put option with a strike price of K. Compare this with the price of a 
corresponding European put if (i) K = 113 and (ii) K = 102. 


Solution. The value at time 1 is (K — 120), for an upward move and (K — 80), for a 
downward move. By (20.3), the risk-neutral probability of the downward move is 1/4. In the 
extreme case that K < 80, the option is clearly worth nothing. Take the other extreme where 
K > 120. The value at time 0 in the European case would be [(3/4)(K — 120) + (1/4)(K — 
80)](1.17!) = K(1.1)*! — 100 which is less than K — 100. The price of the option is K — 100 
and the strategy is to exercise immediately at time 0. 
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Now consider the case when 80 < K < 120. The price will be the maximum of 
{K—100, 0.25(K — 801.1)! ] 


where the second term is the price of the European put. We can solve to show that immediate 
exercise is optimal precisely when 


K > 1800/17 = 105.88. 


So for K = 113 the price of the American put is 13 as compared to 7.50 for the European 
put. When K — 102, the price of both options is 5. 


20.8 A general financial market 


We often wish to model situations which are much more complicated than the ones we 
considered in the previous section. For one thing, we may have several risky assets rather than 
one. For another, the evolution of prices may be given by a more involved structure than the 
binomial tree, and even in the binomial case, it may have a more complicated form than the 
constant up and down ratios of u and d. 

A typical example of such a general market is modelled as a tree like evolution which will 
apply to all assets. See for example Figure 20.2 At each time n we have a number of nodes, 
which we can think of as representing a certain 'state of nature', and all the asset prices are 
determined by this state. This market has three assets and the prices (S, (n), S5(n)) are shown 
at each node. The asset O prices need not be shown as they are known completely once we 
specify r. 

We will now describe the general discrete-time model. The notation is necessarily some- 
what involved, but the market of Figure 20.2 is sufficient to capture the main ideas. We have 
a finite state stationary Markov chain (see Section 18.2) with the following special structure. 
The set of states is divided up into subsets Sp, S; ... Sy, where S, denotes the set of states 
at time k. There is a single state sy € Sọ reflecting the fact that there is no uncertainty about 
prices at time 0. The only transitions of positive probability are those from a state to another 
one which is one period later. That is, given states i in S, and j in S,, where m is not k + 1, 
we must have pj; = 0. For example, in our multi-period binomial model, there were k + 1 
states at each time k, and for each state at time k, there were exactly two transitions into 
a state at time k + 1. In the market of Figure 20.2, we would have Sọ = 0, S; = {1,2,3}, 
S = (4,5,6,7,8,9,10). 

In the binomial case, we described outcomes by sequences consisting of U and D. In the 
general case, where we can have more than two branches from a node, we need a somewhat 
different representation. The evolution of our system up to time k can be described by a 
sequence of states (59, S1, ...,5,) where each s, € S, and Pss? 0 for j « k. We will call 
such a sequence an admissible k-sequence. We now take our sample space Q to consist of all 
admissible N-sequences, and this is equipped with the probability measure P where for any 
admissible N sequence œ, we have P(@) = Ip Psy Seat" 

As an example, in the market of Figure 20.2, Q consists of the 7 elements {{0, 1,4}, 
{0,1,5}, {0,2,6}, (0,2, 7), {0, 3, 8}, (0,3, 9), (0,3, 10}}. 
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Figure 20.2 A finanical market with two risky assets 


It is convenient to adopt the following notational device. For any admissible k-sequence 
v with k < N we let v? denote the set of all œ € Q which extend v. That is if v = 5p, 51, ..., Sks 


then 
v? = (v EQ : the first k + 1 entries of œ are so, 54, ... Sp, ]. (20.10) 


For example, in the market of Figure 20.2, {0,3}° will denote the subset { {0,3, 8}, {0,3,9}, 
{0, 3, 10} }. 
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We now want to capture formally the nature of quantities like asset prices or asset holdings 
at a certain time k. These are random before time k, but are then known with certainty at time 
k or later. For example, in the market of Figure 20.2, the price of asset 1 at time 1 is a random 
variable which is uncertain at time 0, but then is known precisely at time 1. So it will take 
the constant value of 120 on the set {0, 1}° = {{0, 1, 4}, {0, 1,5}}. Similarly it will take the 
constant value of 100 on the set {0,2}° and the constant value of 80 on the set {0,3}°. 

We handle this in general by the following definition. 


Definition 20.5 For any integer k = 1,2,..., N, a random variable V defined on Q is said to 
be k-determined if V is constant on any set v? where v is an admissible k — sequence. 


For a k-determined random variable V, we will write V(v) to denote the constant value of 
V on the set v?. 

It follows from the definition that a k-determined random variable is also m-determined if 
m > k. This just reflects the fact that if we know something at time k, we will know it just as 
well at some later time. 

Note that a 0-determined random variable is just a constant, since the only 0-admissible 
sequence is that with the single entry of sọ and sẹ is the entire set Q. 


Definition 20.6 For any random variable W, we define a k-determined random variable 
E,(W) as follows. Suppose that the point œ € v?. That is, v comprises the beginning k + 1 
entries of œ. We then define 


E,(W)(o) = EQV|v^) 
using the notation of (20.10). 


So E,(W) then just gives the expected value of W conditional on the first k-steps of the 
evolution. 

It is clear from the definition that E,(W) is k-determined. E)(W), being 0-determined, is a 
constant and just equal to the usual expected value E(W). Consider the other extreme where 
k = N. Then for any c, the set v? is the single point œ and E(W |o) is just W(@), showing that 
EN(W) = W. So as k increases, E,(W) gives us more and more information about W until we 
reach time N and know W exactly. 


Example 20.5 Consider Figure 20.2. Suppose that where there are two branches emanating 
from a node, the probability of an upward move is 2/3 and that of a downword move is 1/3, 
while in the case of three branches emanating from a node, each has probability 1/3. Describe 
the random variable E,(S; (2)). 


Solution. Consider the set B = {0,1}°, which consists of the two points, namely {0,1,4} 
that has probability 4/9 and (0, 1,5} that has probability 2/9. We could calculate that the 
conditional probabilities given B are (4/9)/(6/9) = 2/3 for (0, 1,4) and (2/9)(6/9) = 1/3 
for (0, 1,5}. Observe now that we did not have to do all of this calculation, since the tree-like 
structure makes it possible to read off these conditional probabilities from the future branches 
of the tree without worrying about the past. In this case, with only one future step, they are 
immediate. We then have that Ej ((S (2)) takes the value of 2/3(130) + 1/3(110) = 123 1/3 
on (0, 1, 4) and{0, 1, 5). Similarly, it takes the value of 101 2/3 on {0,2,6} and (0,2, 7) and 
the value of 80 on {0,3,8}, {0,3,9} and (0,3,10]. 
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To summarize, a financial market with M + 1 risky assets and of duration N is modelled 
by a Markov chain with the special structure as noted above, a probability measure on the set 
Q of all paths from time 0 to time N, a risk-free interest rate r, and random variables S (n), 
j=0,1,...M, n=0,1,...,N, on Q where each Si(k) is k-determined. A trading strategy 
consists of a collection of random variables aj(n), jJ=0,1,...M,n=0,1,...N—1 where 
each aj(k) is k-determined. 

For an important application to follow, we now turn to the concept of a martingale 
introduced in Section 18.3, and look at conditions for this to occur in our present context. 
Fix a probability measure Q on Q which is equivalent to P and let {W,,},n = 0, 1, ..., N bea 
sequence of random variables such that each W, is k-determined We claim that this will be a 
martingale, provided 


E Wai) Wy k=0,1,...,N—1. (20.11) 


To see this, suppose that the above holds. Fix any k and a sequence of real numbers 
[wg W1; ..., Wg}. Consider any set which has positive probability under Q and is of the form 


A= (o € Q: W(o) 2 w;,i =0,1,...,k}. 


Now by definition, membership in A is determined by what happens up to time k. If a sequence 
æ € A, any sequence which has the same first k + 1 entries must also be in A. This implies 
that A must be the union of subsets of the form v? for some k-admissible sequence v. For any 
such v, it follows from our hypothesis (20.11) that 


E(V 41v?) = EVO) = Wy(v) = wi 


and from (A.22), (applied to the sample sample space A with the conditional probability 
Q(-|A) we can conclude that 


E(W;.41|A) = wj. 


showing that the sequence is a martingale. 

To apply this, refer again to Figure 20.2. Let Q be the probability measure which assigns 
1/3 to each transition when there are three transitions out of a state and 1/2 to each transition 
when there are two. We can then see that the sequence of prices of asset 1 is a martingale under 
this measure, by simply verifying the condition (20.11) at each node. For example, at state 3, we 
have that the value of S,(1) = 80 and the value of E; (S,(2)) = (1/3)90 + (1/3)80 + (1/3)70 = 
80. The same holds at all other states. Similarly, we can show that the same holds for $,(7), 
the sequence of prices of asset 2. 


20.9 Arbitrage-free condition 


To decide when a general financial market is arbitrage-free, directly from Definition 20.2, 
could be extremely complicated. We would have to consider all possible initial portfolios with 
value 0 and all possible self-financing trading strategies. Fortunately, there is often a faster 
way. Suppose we can find a probability measure Q, equivalent to P, such that for each asset 
i, the sequence (S;(n)) is a martingale under Q. Consider any self-financing trading strategy. 
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At any time n, the portfolio has a value V(n). For each asset i, the expected value at time 
n+ 1 will again be S;(n) and so the expected value of the portfolio before trading will be 
V(n). Since our trading strategy is self-financing, the expected value after trading will again 
be V(n). Since this is true for all possible values of the portfolio at time n, we must have 
that Eo[V(n -1]-2 Eo[V(n)]. (The subscript indicates that expectations are with respect to 
the probability measure Q.) Working inductively, we have that EQ[V(N)] = V(0) = 0. It is 
impossible for V(N) to be nonnegative for all outcomes, have a positive probability of being 
positive, and still have an expectation of 0, so we cannot have an arbitrage opportunity. 

This seems like a nice simple answer but on the face of it there is a major problem. It 
is not reasonable to expect that our stochastic processes for stock prices are martingales, as 
we indicated in Section 18.3. In fact, the bank account, by definition, cannot be a martingale 
unless r — 0. So our result above may appear at first to be meaningless, but the following trick 
saves the day. 

We do not have to measure our assets in terms of dollars. They can be expressed relative 
to some other asset. Define 


$0) = S(0/ Syn. 


That is, Sim) is the value of asset j at time n in terms of the bank account. We can think of 
S,(n) as a discounted or present value, since it is what we would have to invest in our risk-free 
bank account in order to accumulate to Sj(n) at time n. It is a random variable rather than 
a number since 5;(n) is a random variable. The same argument we gave above clearly goes 
through if each $n) is a martingale. This is now possible since So(n) takes a constant value 
of 1. We have therefore proved the ‘if’ direction of the following major result. 


Theorem 20.4 (The fundamental theorem of asset pricing) A financial market is 
arbitrage-free if and only if there is a probability measure Q on Q which is equivalent to 
P, and for which S; is a martingale for j = 1,2, ..., M. 


A major example is the the multi-period binomial model, where the given risk-neutral 
measure satisfies the conditions of the above theorem, as we verify from Equation (20.11). 
Indeed, suppose that S(k) = s, so that S(k) = s(1 + r}. Referring to Equation (20.3), for any 
o EQ, 


[d 7) — d) * d[u — (1 9 r)] 


E, ($4,400) = s0 +k = 2 = s(14 pel 


so that 
E,[S(k + 1)] = s. 


As another application, we can conclude immediately from our observations in the pre- 
ceding section that the market of Figure 20.2 is arbitrage-free when r = 0. 

Note that the risk-neutral probability p that we gave in the one-period binomial market 
was the only possible value that would make $, a martingale, as shown by Equation (20.2). 
The terminology is carried over and any probability measure Q satisfying the conditions of 
Theorem 20.4 is known as a risk-neutral measure. The main conclusion of this section then 
is that usually the best way to show a given a given financial market is arbitrage-free is to 
show the existence of a risk-neutral measure. 

The converse of the fundamental theorem will be proved in the following section. 
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20.10 Existence and uniqueness of risk-neutral measures 


20.10.1 Linear algebra background 


To complete our study of financial markets, we require a knowledge of some facts in linear 
algebra. We assume familiarity with the concept of a linear space (also known as a vector 
space) and linear subspaces. We also assume familiarity with the concepts of closed and 
bounded sets. Any basic text on multivariate calculus should contain the necessary details. 
The following is a brief review, adapted to our ultimate goals. 

Consider in particular the vector space W consisting of all real-valued functions defined 
on some finite set S, with the operations of point-wise addition and scalar multiplication. 
This is an n-dimensional space where n is the number of points in S. We let 0 denote the 
function which takes the value 0 at each point of s. (The context should distinguish this from 
the number 0.) For any f, g in W, we have an inner product 


f:g- Mfosgo. 


ses 


A subset K of W is said to be convex if it contains the line segment joining any two of its 
points. That is, given f and g in K and and a scalar O < y < 1, the function yf + (1 — y)gisin K. 

A hyperspace in W is a proper linear subspace of maximum dimension, that is one less than 
the dimension of the space. So for example, a hyperspace can be visualized in two-dimensional 
space as a line through the origin, or in three-dimensional space as a plane through the origin. 
We need the following two facts about hyperspaces. The first is a fairly standard result and 
not difficult to verify. The second is quite a bit more advanced. 


1. Any hyperspace H is determined by its so called orthogonal vector. That is, there is an 
element q # 0 in W such that 


H={hEW:q-h=0}. 


The element q is unique up to a scalar multiple. In two or three dimensions, we can 
visualize it geometrically as a vector perpendicular to H. 


2. Let Lbe any linear subspace of W and let K be a closed and bounded convex set that does 

not intersect L. Then there a hyperspace H containing L such that K does not intersect H. 

It is simple enough to visualize this geometrically in three-dimensional space. If a 

line does not intersect a closed and bounded convex set, we can find a plane containing 

the line which does not intersect the set. This of course does not hold if the set is not 

convex. Suppose that K is a doughnut-shaped region, and the line goes through the 
hole. Then any plane containing the line must intersect K. 


20.10.2 The space of contingent claims 


We return now to our model as described above and apply our linear algebra concepts. For a 
given financial market, define the following sets. Let 


W be the linear space of all real valued functions on Q. 
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We can view this as the space of contingent claims, those payments at time N which are 
determined by the particular path. An important subspace of W is given by 


L= (f € W : there exists a self-financing trading strategy such that for all o € Q, 
f(a) = V(N)(o)]. 


So L is the subspace of all replicable claims as defined above in Section 20.5. It is a linear 
subspace since, given f and g in L, we can replicate f + g by just holding at each stage the 
sum of the holdings in the trading strategies replicating f and g, and we can similarly achieve 
any scalar multiple of f by multiplying our holdings by that scalar. Let 


Lg = {f E L: there is a replicating self-financing trading strategy for f with V(0) = 0 


This easily seems to be a linear subspace of L. We let 


K= Urevinozonraeen Zro}. 


EQ 


which is a convex subset of W. 

So a nice linear algebra definition for a financial market to be arbitrage-free is to simply 
say that Lọ does not meet K. (Of course any nonzero, nonnegative function f in Lo represents 
an arbitrage opportunity, but an appropriate scalar multiple of such an f will be in K and also 
in the subspace Lọ. We also use the fact that our original probability measure P must take a 
positive value on each o.) 

To illustrate, Figure 20.3 gives a geometric picture of the one-period binomial market. 
Any contingent claim is represented by a point (f(U), f(D)) in the plane. The set K is the line 


f(D) 


Lo - arbitrage 


Lo - arbitrage-free (0,1) 


f(U) 


Figure 20.3 A picture of the one period binomial market 
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120 


100 100 


80 


Figure 20.4 A market in which not all contingent claims are replicable 


segment joining the points (0, 1) and (1, 0). The subspace Lọ is a proper subspace and therefore 
must be a line through the origin. In the arbitrage-free case, this line will have negative slope 
and not meet K. In the case of an arbitrage opportunity, Lg as represented by the dotted line, 
has a slope that is either nonnegative, or equal to oo, and it must intersect K. The picture also 
makes it clear that L is the entire plane, as we noticed enough, since it a subspace that properly 
contains Lp. 

Further examples are furnished by the markets of Figures 20.4, 20.5 and 20.6. Assume 
that r = 0. Alternatively, we can assume any positive r and interpret the asset values that are 
given as as $ (n) rather than S (n). The conclusions will be the same in either case. 

In the single risky asset market of Figure 20.4, the set Q will have three points U, M, D 
(for ‘up’, ‘middle’, *down"). If the initial portfolio has a units of stock and f units of the bank 
account, the time 1 value of portfolio will be 120a + f for the upward movement, 100a + f 
if the price stays the same, or 80a + f for the downward movement. It follows that 


L= {f : f(U) + f(D) — 2f(M) = 0}, 
a two-dimensional subspace of W, showing that not all contingent claims are replicable in this 
market. In particular, a call option with strike price 110 will have f(U) = 10, f(M) = f(D) = 0 


which is not in L. 
For initial portfolios of value 0, we must have in addition that 100a + f = 0 leading to 


Lo = (f : f(M) = 0: F(U) + f(D) = 0}. 


130, 120 


100, 100 


60, 80 


Figure 20.5 A market that is not arbitage-free 
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120, 100 


111, 91 110, 90 


90, 70 


Figure 20.6 See Exercise 20.7 


The intersection of Lọ and K is clearly 0, showing that this market is arbitrage-free. 
Of course we could have immediately deduced this from the Fundamental Theorem, since 
assigning probabilities of 1/3 to each branch yields a risk-neutral measure. 

Consider the two risky asset market of Figure 20.5. An initial portfolio with value 0 will 
be given by the vector of the form 100(—(a + f), a, f). Then, a function f will be in Lo if we 
can find a, f satisfying 


30a + 208 =f(U),  —40a — 20p = f(D). (20.12) 


These equations are readily solved to give a = —(f(U) + f(D))/10, B = (Af(U) + 
3f(D)/20. It follows that Lọ = L = W. So this market is about as far from being arbitrage-free 
as we could possibly get. Any contingent claim can be replicated for an initial cost of 0! 
Obviously, the prices here as shown could not be maintained by rational investors. 

To see the delicacy of situations like this, look at this market again, but make the modifi- 
cation that S; (1) = 70 instead of 60. We leave to the reader to verify that we still have L = W, 
but Lg is quite different. The coefficient of a in the first equation of system (20.12) is 40 
instead of 30, so that now 


Ly = (f € W: f(U) +f) = 0 


as in the market of Figure 20.3, which shows that the market is arbitrage-free. We can also 
deduce this immediately from the Fundamental Theorem, since now there is a risk-neutral 
measure, Q(U) = Q(D) = 1/2. 

The market of Figure 20.2 that we investigated in Section 20.8 is more complicated, and 
it would involve a great deal of calculation to try to deduce Lọ exactly, although we do know 
that it cannot meet K due to the arbitrage-free condition. It is possible to show that L = W, but 
this is far from obvious from the figures as given. As a particular case, consider the following. 


Example 20.6 Let X be the contingent claim that that takes the value 60 on UU and 0 
elsewhere. Take r = 0. Find a self-financing trading strategy to replicate this claim. 


Solution. For any such strategy, each of the lower two nodes at time 1 will lead to a claim 
of 0 at time 2, so by the martingale property, our portfolio must have value 0 at these nodes. 
Therefore, looking at the pre-trading values at time 1, 


ag(0) + 100a, (0) + 90a3(0) = 0, — ag(0) + 80a, (0) + 80a,(0) = 0, 
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which gives 
a, (0) = —0.5a5(0), | ag(0) = —40a»(0). 


Similarly, the value at the upper node at time | must be 60 times the probability of an upward 
move, which is 60(1/2) = 30. So 


@ (0) + 120a; (0) + 130a,(0) = 30, 
and substituting from above we have, 
ag(0)2 —40, a,(0)=—0.5, a,(0) = 1. 


To summarize, the trading strategy at time 0 is to buy 1 unit of asset 2, financing this by 
selling 1/2 unit of asset 1, borrowing 40 and putting up the remaining 10. This checks out 
since the initial cost must be 60Q(UU) = 60(1/3)(1/2). At time 1, if the middle or lower 
branch occurs, sell the unit of asset 2, which is just enough to cover the short position and pay 
off the loan. 

We must now decide what to do at time 1 if the upper branch occurs. In this case, we have 


ag(1) + 130a,(1) + 160a5(1) = 60, — ag(1) + 110a,(1) + 100a,(1) = 0. 


There are several solutions to these equations, which indicates that a replicating self-financing 
trading strategy need not be unique. One example is to take 


ay(1)2 -100, a,(020, a(1)=1. 


This strategy involves borrowing an additional 60 to cover the short position in asset 1. At 
time 2, we pay off the loan of 100, and have either 60 or 0 left, depending on what happened 
to asset 2 at time 2. 


Similarly, for each of the other six paths, we could find the replicating strategy for a 
claim which pays off only on that path. We would take a suitable linear combination of these 
seven strategies to replicate any possible contingent claim. This will show that L — W. In 
Section 20.11, we will prove a result which provides a much easier way to see this. 


20.10.3 The Fundamental theorem of asset pricing completed 


In this section, we prove the converse to the result established above, and show that in any 
arbitrage-free financial market, we can find a risk-neutral measure Q. We will proceed in two 
stages. 

Stage 1: Defining Q: 

By the arbitrage-free assumption, Lo does not meet the set K, a convex, closed and bounded 
set. By our linear algebra results of Section 20.10.1 we can find a hyperspace H containing 
Lo and not meeting K. Let q be an element orthogonal to H. That is H = (h:q-h—0). 

Now it cannot be that for two distinct points f and g in K, we have q -f < 0 and q: g > 0, 
for if so we could find y such that the function h = yf + (1 — y)g satisfies g-h = O and so 
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h € H. But by convexity h € K, and this would contradict the fact that H does not intersect 
K. So by a change of sign, if necessary, we can assume that q- f > 0 for all f € K. Now in 
particular the functions 1,, which take the value of 1 on œ and the value 0 elsewhere are in K, 
and so we can infer that for all œ € Q, 


q(o) 2 4:1,» 9, 


and by multiplying by a suitable scalar we can ensure that 


Y ao) = 1, 


oco 


which means that the function q is the probability function for a probability measure Q on Q. 

Stage 2: Showing the martingale condition: 

Fix any j. We will show that the stochastic process S, is a martingale under Q. For any time 
n « N and any possible value s of Sj (n), let A denote the event that Sj jm = — s, Which means 
that Sj) = =Ęs(l +r)”. 

We must show that 


Eo(Sj(n t DIA) =s +r)”, 
or equivalently that 
EQ(Y|A) = 0, (20.13) 
where 
Y=S$(n+1)-s(l+n™. 


Consider the following trading strategy. Do nothing before time n. If the price of asset j at 
time n is not equal to s, do nothing at all. If the price at that time is s, buy 1 unit of asset j, 
borrowing to do so, and sell it at time n + 1. Apply the proceeds to repaying the loan and let 
the difference (which could be negative) accumulate in the bank account. Let f be the function 
in W corresponding to this strategy. 

If S,(n) = s, this strategy yields a bank account of [S;(n + 1) — s(1  r)] at time n + 1. 
Multiplying by (1 + r)"7"-!, the accumulated amount at time N is (1 +r) Y if the purchase 
is made and 0 if the purchase is not made. Our trading strategy is self-financing and requires 
an initial investment of 0. Therefore, the function g given by 

(1-rylY(o) ifo€A 
«w= [0 ifogA 


is in Lọ. This means that 


0=q:8= 2: qo) + r)"Y(v) = (1 + n" Q(A)EQ(Q|A). 
@EA 


The fact that s is a possible value of S; j (0 implies that Q(A) > 0, and so we must have 
Eo(Y |A) = 0, establishing Equation (20. 13). 
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One method of showing that a market is not arbitrage-free is to find the subspace Lp and 
show that it intersects K. But as we saw above, this can be computationally infeasible in all 
but the simplest cases. The converse of the Fundamental Theorem provides an easier way. 


Example 20.7 Use the above result to show that the financial market of Figure 20.5 is not 
arbitrage-free. 


Solution. Given a probability measure on Q, let q be the probability of an upward move. 
For S, to be a martingale, we need 130g + 60(1 — q) = 100 so that q = 4/7. For $, to be a 
martingale, we need that 120g + 80(1 — q) = 100 so that q = 1/2. No such measure exists. 


20.11 Completeness of markets 


Inthis section, we pose the following questions. Given an arbitrage free market, can we price all 
contingent claims by the two methods we had in the binomial model? Can we do so uniquely? 

The uniqueness question is easily answered for an arbitrage-free market. As we showed in 
the binomial case, if we can replicate a contingent claim with a self-financing trading strategy, 
then the cost of that claim should be the cost V(0) of setting up the initial portfolio. What 
happens, however, if there are several different replicating self-financing trading strategies? 
This can certainly occur, but in an arbitrage-free market, they necessarily have the same V(0) 
which means a unique price. Suppose to the contrary that there were two replicating strategies 
for the same contingent claim, one with an initial cost of 100 and the second with an initial 
cost of 60. The investor could follow both the second strategy and the reverse of the first 
strategy for a net gain of 40 at time 0 which would be placed in the bank account, resulting 
in an overall initial value of 0. At time N, the payments on these two strategies would cancel, 
leaving a certain positive amount in the bank account, contrary to the fact that there were no 
arbitrage opportunities. 

We turn now to the existence question, beginning with a definition. 


Definition 20.7 A financial market is said to be complete if, given any contingent claim X, 
there is a self-financing trading strategy that replicates X. In other words, using the notation 
of the preceding section, the subspace L is all of W. 


We have already shown that the multi-period binomial market is complete. Moreover in 
Figure 20.4, we gave an example of an incomplete market. 
The following theorem gives a characterization of completeness for arbitrage-free markets. 


Theorem 20.5 An arbitrage-free market is complete if and only if there is a unique risk- 
neutral measure. 


Proof. Suppose that the market is complete. Let Q be any risk-neutral measure. Fix any 
o € Q. Let X? be the contingent claim that pays 1 if œ occurs and pays 0 for all other 
outcomes, and choose a self-financing trading strategy that replicates X^. If V(0) is the cost 
of the initial portfolio, the martingale property ensures that 


VO) = (1 +r) " Eg(X?) = (1. 7)" Q(o), 
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so that 
Q(o) = V(0)1 + "^, 


showing that Q is uniquely determined. 
Conversely, suppose that the market is not complete, so that L is not equal to all of W. We 
can then choose a nonzero function / € W such that 


h-f =Oforallf € L. (20.14) 


(Since we can do this for a hyperspace, we can clearly do it for any proper subspace which 
is contained in some hyperspace.) The function which takes the constant value 1 is in L, 
(achieved by investing (1 + r) in the bank account at time 0), so we must have that 


> hœ) = 0. (20.15) 


EQ 


Let Q be the probability measure with the probability function q as constructed in proving 
the ‘only if’ part of Theorem 20.4. We will produce a different function q' with the same 
properties as g, namely 


q'(@) > 0, for all w € Q, (20.16) 
¥ 1) =1, (20.17) 
EQ 

q' -f =0, forall f € L. (20.18) 


Then g' will induce a second martingale measure Q’. To construct gq’, we note that since 
q(@) > 0 for all o, we can choose a positive number 6 sufficiently small so that the function 


g =q+t+6h 


satisfies (20.16), and since h 4 0, q’ is different from q. In view of Equations (20.14) and 
(20.15), it is clear that q’ also satisfies Equations (20.17) and (20.18). 


To illustrate the proof of the last part, look again at the market of Figure 20.3. The function 
q(@) = 1/3 for all v gives us a risk-neutral measure. The space L is, as we have seen, the set 
of all functions such that 


fU) — 2f(M) + f(D) = 0 
and the perpendicular function h can be taken as 
h(U)=1, h(M)--2, hD)=1. 


We can take then ó to be any number strictly between —1/3 and 1/6, which yields an infinite 
number of risk-neutral measures. 

As a consequence of the above theorem, we can see immediately that the market of 
Figure 20.2 is complete, without going through the somewhat involved calculation of the 
replications that we did before. The probability assignment which we gave in Section 20.8 is 
clearly the unique risk-neutral measure. 
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Incompleteness means that there are not sufficiently many assets to account for all the 
variations in possible contingent claims. Comparing Figures 20.4 and 20.5, we see that with 
three branches we need at least three assets in order to achieve completeness. This explains 
also the fact that with only one risky asset, we need a binomial model to achieve completeness. 

For an additional example, consider the two risky asset market of Figure 20.5. We leave 
it to the reader to decide whether or not it is arbitrage-free, and whether or not it is complete. 

At this point, we summarize our conclusions. Suppose we have an arbitrage-free financial 
market. If the market is complete, then for any contingent claim X, there is a unique price 
which will prevent arbitrage opportunities. This can be found in one of two ways. First, choose 
a self-financing trading strategy to replicate X, and the price will be V(0). Second, take the 
discounted expected value of (1 + r)~X with respect to the unique risk-neutral measure Q. 
If the market is incomplete, then we can still find a unique price for the replicable contingent 
claims. However the non-replicable claims cannot be priced uniquely, since different choices 
of a risk-neutral measure can give different results. Some other criteria must be used to arrive 
at prices. This does not mean of course that there are no restrictions on the price of such 
claims. Often, a range of values can be computed. 


Example 20.8 In the market of Figure 20.4, find the possible no-arbitrage prices for a call 
option with exercise date 1 and strike price 110. 


Solution. Add the option as another asset S5 with a price of z at time 0. For a martingale 
measure which has probability p of an upward movement and probability q of staying the 
same, we have 120p + 100g + 80(1 — p — q) = 100 implying that 2p + q = 1, so that p < 1/2. 
Applying the martingale condition for the new asset gives z = 10p, which means that we must 
have 0 « z < 5. Conversely, all such values are admissible since for any such z we obtain a 
martingale measure for the enlarged market by taking 


p=7/10, q=1-2/5. 


In some cases, specifying the price of certain contingent claims will determine others. 
To illustrate, having added the additional asset and having specified z in Example 20.8, the 
resulting martingale measure is unique, so we have in effect completed the market. The price 
specified for the option will determine unique prices for all other contingent claims. 


20.12 The Black-Scholes- Merton formula 


The discrete-time model illustrates many of the fundamental principles of pricing contingent 
claims. There is a limitation, however. To be at all close to a realistic model, we would 
need an enormously large number of time periods and the computations will quickly become 
intractable. This is apparent just from looking at the cases for N — 2 ofthe previous section. For 
actual computation and greater realism, the preferred method is to use continuous-time models, 
which involve more advanced mathematical machinery. The relatively simple calculations 
effected by solving linear equations above will be replaced by solving differential equations. 
Sums of random variables are replaced by integrals of random variables, a concept that is 
technically very complex. The characterizations of arbitrage-freeness and completeness which 
we gave can be generalized to continuous-time settings but the proofs are far more difficult. 
A rigorous development of this is beyond the scope of the book, However we do want to 
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investigate the Black-Scholes-Merton formula. This is a pivotal result which in fact initiated 
much of the modern research into stochastic models in finance. 

In this section, it is convenient to use the risk-free force of interest 6 = log(1 + r) 
instead of r. 

We consider again the case of a financial market consisting of the bank account, and a 
single stock whose price at time ¢ is S(t). The difference is that we now allow trading at any 
time, and moreover, we allow asset prices to vary continuously. We therefore must model S(t) 
as a continuous-time stochastic process and will in fact choose a geometric Brownian motion 
process, as given in Section 18.7. That is, for some constants u and c, 


SO) = SO)" *°F0, 


where B(t) is a standard Brownian motion. 

The problem is to find the price of a European call option with exercise date N and strike 
price K. One approach is to make use of the results that we already know for the discrete- 
time model. Suppose we can find a probability measure Q under which S(t) is a martingale. 
Then we can approximate our process with our discrete time multiple period binomial model 
with periods of length 1/m, u = erVi/m_ d= e7^ VV" and with risk-free force of interest = 
ó/m, provided we take m sufficiently large. This should seem plausible since the log of our 
binomial process is a random walk. In this discrete setting, the price of the option is given in 
Equation (20.9). Putting in all the parameters and taking limits as m goes to oo, we arrive at 
the celebrated Black-Scholes-Merton formula: 


Option price = S(0)®(a + o VN) — Ke?" @(a), (20.19) 
where 
m log(S(0)/K) + (6 — o? /2)N 


oN 


and 6 is the c.d.f. of the standard normal distribution. An alternative method to derive the 
formula is to directly calculate e oN Egl(SW. ) - K),]. This is more straightforward, involving 
basic calculus, although the calculation is somewhat involved. For both of these methods, we 
will omit the detailed derivations. 

Note that the parameters to determine the option price are the risk-free force of interest 6, 
the strike price K, the duration N, and the quantity c. The latter is a measure of the tendency 
for prices of the underlying stock to vary and is known as the volatility of the stock. Note 
however that the formula does not depend on u. We comment more on this below. 

The resulting formula seems somewhat formidable, but we can provide some motivation. 
The following is not intended as a rigorous exposition, but more to provide a method of remem- 
bering and understanding the structure of the formula. First consider the random variable 


(20.20) 


X = log[(SQN)/5(0)]. 


What is its distribution under Q? From the definition of Brownian motion, we know that 
under our original probability measure, it is normal with mean „N and variance c?N. If we 
make the assumption that it is still normal under Q with the same variance, and calculate the 
resulting mean M, we know from Equation (A.58) that 


Eo[S(8)/8(0)] = eM+(o? /2)N_ 
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But, since $ 'y iS a martingale with respect to Q, 


S(O) = E(Sy) = e ?" E(Sy), 
so that 
Eg[S(N)/5(0)] = e?" 
leading to 
M = (6-07 /2)N. (20.21) 


We conclude that under Q, we have a new drift parameter (6 — o? /2) which is completely 
independent of the original drift parameter 4. This is analogous to the observation in the 
simple one-period binomial model where the original probability played no role in the option 
price and it was only the risk-neutral probability that mattered. 

Now let us ask, what is the probability under Q that the buyer of this option will exercise 
it? This will occur if S(N) » K. The probability of this is the same as the probability that 
X > log(K/S(0) which equals the probability that —X < log(S(0)/K), which by the results 
calculated above is 


E 
(mso: (6 — 0° /2)N | (20.22) 


oVN 


Given our above assumption regarding the distribution of X, we have now identified the 
term (qa) in the Black-Scholes-Merton formula as the probability under the risk-neutral 
measure that the option will be exercised. 

Now consider the following rather naive reasoning to arrive at the option price. If we 
exercise the option, we will have an expected gain at time N of E[S(N) — K)] and this will 
have a present value of E[S(N) — K]e-?N. Multiply this by the probability of exercise to get 


S(0)®(a) — Ke? Pda). 


This looks something like the actual Black-Scholes-Merton formula, but unless o = 0, 
the coefficient of S(0) differs. Our naive approach, however, ignores the nature of the option 
contract. With positive values of c, the stock price at time N will vary, and could be above or 
below the exercise price. Calculating E(S(N) — K) will include negative values when the price 
is below K, but there is no loss in these cases since the option will not be exercised. The naive 
formula therefore understates the true price. Another way of expressing this is to note that 
what we really want is the expected value of S(N) — K, given that the option is exercised. 
To correct this understatement, it turns out that the coefficient of S(0) must be increased 
from ®(a) to (a + o /N ). This makes sense as we would expect that this correction should 
increase as the variability in the stock price and length of period increase. 

For further insight and verification, we can directly verify the formula for N = 0. In this 
case, (a) = ®(—co) = 0 if S(O) < K, and the formula gives an option price of 0, which it 
should, while ®(a@) = (co) = 1 if S(O) > K and the formula gives an option price of S(0) — K 
which it should. 

There is still another important feature involving the two terms in the Black-Scholes — 
Merton formula, which we now present. Start with the following question. For the call option 
as above, what is its value at any time t, 0 € t € N? By the same argument as above, the 
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value will be given by the same formulas as above except that in both (20.19) and (20.20), 
S(O) is replaced by by S(t) which is the stock price at time t and N is replaced by N — t, the 
remaining duration. It turns out that the two terms of this formula yield a replicating strategy 
for the option, and therefore a hedging strategy, in terms of holdings of the stock and the 
bank account. To explain this, we will take new units for the bank account, by adopting a 
slightly different but equivalent point of view, We can view the bank account as an investment 
in risk-free zero-coupon bonds maturing at time N. A unit of this asset will be a bond with 
face amount K, so that its value at time 0 = K(1 + r)™. Let a, denote the value of (20.20) 
with S(0) replaced by S(t) and N replaced by N — t. The replicating trading strategy is to hold 
O(a, + o(N — t)) units of the stock, and carry a short position of ®(a@,) units of the bonds we 
just described, at any time ft. At each time f, the value of this portfolio will be precisely the 
value of the option as given above and it will reach the right amount at time N. In the case 
where the option is to be exercised the final portfolio will consist of 1 unit of stock, and a 
short position of 1 bond. In the other case, the portfolio will be reduced to zero units for both 
assets. It can be observed that the idea behind this strategy is similar to what we observed in 
the call option of Example 20.2, where the replication is accomplished by buying stock as the 
price increases, or selling as the price declines. 

One must also show that this strategy is self-financing, using an appropriate modification 
of this definition to apply to continuous time. The idea is that at each instant, the amounts 
received from selling one of the assets is exactly what is required to buy the other. A precise 
definition is based on the formulation given in Exercise 20.3, with the differences replaced 
by differentials. This is not easy, however. The problem is that these are differentials of 
stochastic processes rather than deterministic functions, and a rigorous presentation involves 
some knowledge of the subject known as stochastic calculus. 


Remark In real life of course this replication by continuous rebalancing is not possible. 
One could try to get close by very frequent rebalancing but there is no guarantee that this 
discrete-time approximation is self-financing. It could well require additional amounts of cash 
or release such. But one can hope that if our model is sufficiently accurate that these extra 
amounts are relatively small, so that the option could be replicated for something close to the 
Black-Scholes-Merton price. 


To conclude this section, we note that while the Black-Scholes-Merton formula has 
persisted as the main tool for option pricing, itis based on several simplifying assumptions. One 
such assumption is that of the log-normal evolution of stock prices. There has been evidence 
to show that this is not completely realistic, and alternative models have been investigated. 
Another assumption is that both the risk-free force of interest and the volatility are constant 
and known, as these must be inserted into the formula to obtain numerical results. More 
realistic models have been proposed where these quantities are both considered as random. 


20.13 Bond markets 


20.13.1 Introduction 


We now return to the discrete-time case. In this section, we deal with markets where the 
assets are risk-free zero-coupon bonds. These were already introduced in Section 2.12, where 
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we considered forward prices, but here we want to consider the actual prices which will be 
random variables. A major result will be to give an appropriate version of 2.1 to ensure no 
arbitrage in the case of stochastic discount functions. 

Our market will consist of N + 1 assets where for n Æ 0, asset n is a zero-coupon bond 
maturing for 1 at time n. We will define Sọ later. 

We will denote the random variable S,(n) by v(k,n). For k < n, v(k,n) is the price you 
would pay at time k to receive 1 at time n, which ties in well with our original use of this 
notation in Chapter 2, as well as our notation for forward prices. These are not the forward 
prices however, but the actual prices. Of course v(0, n) is a real number and the same as the 
forward price v(0, n), since at time 0 we know the prices of the bonds. But future prices are 
unknown and therefore modelled as random variables. 

A question which may come to mind now is whether the fundamental Equation 2.1 could 
hold for these quantities only interpreted as a multiplication of random variables. The answer 
is no. Observe, for example, that with randomness 


v(0, m) 4 v(0, n)v(n, m), 


since the left side is a real number while the right side is truly random. 

Now in order to avoid a completely trivial situation, we cannot assume a constant deter- 
ministic risk-free rate r as we did before. Before discussing how we modify this idea, we want 
to recall how risk arises from investment in risk-free assets like bonds, (elaborating on the 
discussion at the end of Section 2.10.3). Note first that unlike a stock, where a value on any 
future date is unknown, the payoff on a risk-free bond is absolutely certain if held to maturity. 
The risk arises if one wants to buy or sell before that date. For example, if the risk-free interest 
today is 5%, then the price of a 3-year bond will be (1.05)? = 0.864. If at time 1, the risk-free 
interest rate has risen to 0.08, then the value of the bond drops to (1.08)? = 0.857. Buying 
long-term bonds therefore carries the risk of a rise in interest rates, which can lower the price. 
However, in our present model, where trading occurs only at integer times, buying bonds 
which mature in 1 period does not carry any risk. This leads us to define the bank account by 


So(0) = 1, Son) = [w(0, Dv(1,2) + v(n — 1,n] ln > 1. (20.23) 


Our bank account is formed by starting with 1 at time 0, using that to buy a bond maturing 
at time 1, then taking the proceeds of v(0, 1)! to buy a bond maturing for v(0, 1-112) 
at time 2, and to continue rolling over the account into a new 1-year bond each year. In the 
case of a constant and deterministic interest rate of r, we would have v(k, k + 1) = (1 + 7}, 
and this definition of Sp reduces to the one given before. 

Note that under this new definition, So(k) is not a definite amount, but strictly random for 
k» 1. 

Using the bank account, we can extend the definition of v(k, n) to all ordered pairs (k, n) 
by defining 


v(k, n) = So(k)/So(n), kn, 
since this is the amount accumulated at time k, by taking 1 from a bond maturing at time n 


and placing it in the bank account. Note that with this definition, all prices at the final time 
N are determined by the previous values of Sọ(n) (and of course the fact that V(N, N) = 1). 
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This means that we can model our bond market with a tree going up to only time N — 1. (See 
Example 20.9 below.) 


20.13.2 Extending the notion of conditional expectation 


For our analysis, we will need some additional results about k-determined random variables, 
We will in fact investigate this idea in more generality and further extend the discussion in 
Section A.8, since the same ideas are needed in Chapter 24. Suppose we are given a sample 
space Q, a probability measure P on Q and a partition II = (B4, By, ... B,,} of Q into pairwise 
disjoint sets with union equal to all of Q, such that for all i, P(B;) > 0. For any random variable 
W on Q, define a random variable Er(W) as follows. For œ € Bj, 


Eg(W)(o) = E(W|B;). 


To illustrate, in the model of Section 20.8, fix k and take the partition II which consists of 
all sets v? where v is an admissible k sequence. (For example, in the market of Figure 20.2 
if we take k = 1, the partition will consist of the three sets {0,1}°, (0, 2)?, {0,3}°.) Then 
Eg (W) is just E,(W) as defined in that section. 

We summarize the facts that we need in the following theorem. 


Theorem 20.6 Take any random variables V, W and a scalar c. 
(a) Eg is linear. That is 


Eg(V + W) = Ej((V) + Eq(W), | Eg(cV) = cEg(V). 
(b) Suppose that W is a random variable which is constant on each subset of II. Then 
Eq(WV) = WEq(V). 


(c) Suppose that II’ is a finer partition than TI which means that every set in TI is a union 
of sets in II'. Then 


Eq lEy(W)] = Ey [En(W)] = En(W), 


Proof. (a) This follows directly by applying (A.23) and (A.8) to each subset B; of the partition 
equipped with the probability measure P(-|B;) 

(b) As in (a) apply (A.8) to each set of the partition. 

(c) It is clear from the definitions that if a random variable is constant on each subset of 
II, then applying Ey to it will leave it unchanged. Now E7,(W) is constant on each subset of II 
therefore constant on each subset of the finer partition I’. It follows that Ep [E,W] = Ep W. 
To show the other order of composition, choose any set A of the partition II and let A be the 
union of sets B4, B5, ... B,, where each B; is a set of the partition IT’. To simplify the notation 
denote Ey, (W) by Z. Now by definition, Z takes the constant value of E(W|B;) on each set B; 
so that clearly 


E(Z|B;) = E(W|B;). 


BOND MARKETS 367 


Now apply (A.29) to the sample space A equipped with the probability measure P(-|A). We 
have that 


EZIA) = Y, E(Z|B)P(B)) = Y EQV|B)P(3) = EWA) 


i-i i-i 


which shows that 


Eg(Z) = Eg(W) 


It is of interest to look at the extreme cases. If we take II to the finest possible partition 
where the sets are singletons, then Ej(W) is just W. If we take II to be the partition consisting 
of just one set, namely the whole space, which is the least fine partition, then Egq(W) is just 
the usual expectation E(W). The latter observation leads to some results of interest which we 
use in Chapter 24. Immediately from part (c) of the Theorem 20.6, for any W, 


E[EqW] = E(W) (20.24) 


More generally, if Z is any random variable which is constant on the sets of the partition II, 
then from part (b) of Theorem 20.6 


E(ZW) = E[Eq(ZW)] = E[ZEqW]. (20.25) 


20.13.3 The arbitrage-free condition in the bond market 


Return to our bond market with the assumption that for all k, the random variable v(k, n) is 
k-determined. What is the condition on v(k, n) to ensure that the market is arbitrage-free? 

Lets go back to the deterministic setting of Chapter 2 for a moment. Note that if a special 
case of Equation (2.1) holds, namely 


wk, n) = v(k, k + Dv(k + 1,n) (20.26) 


for all nonnegative integers k < n, then a straightforward induction argument shows that (2.1) 
will hold for all nonnegative integers k € m < n. (Example 20.9 below will make it clear why 
we begin with Equation (20.26) in place of the more general Equation (2.1)) 

We saw above that (2.1) does not hold when considered as a statement about random 
variables, but could it possibly be that this revised version will be valid? The answer is still 
no, since the left hand side is k-determined, but in general, since v(k + 1,m) is only k + 1- 
determined, the right side will only be k + 1-determined. It is plausible, however, that the 
following natural modification of our statement holds. Namely 


V(k, n) = v(k, k + D)E,v(k + 1, n). (20.27) 
It turns out in fact that Equation (20.27) is the correct condition to prevent arbitrage. 


Theorem 20.7 If we can find a probability measure Q on Q such that Equation (20.27) 
holds for all nonnegative integers k « n < N, then the given bond market is arbitrage-free. 
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Proof. We will show that under Q, each $, satisfies (20.11) and we can then apply Theo- 
rem 20.4. We know from the definition of Sọ that 


So(k + 1) = So(k)v(k, k + 171. 
Moreover, 


A ovk ln) vw(k-cl,nwk,k-1) 
BED EIN Soh) i 


Now So(k) is certainly a k-determined random variable, being a product of k-determined 
random variables, and so therefore is Sot. Using Theorem 20.6(b) and invoking Equation 
(20.27) 


E,[v(k + 1,n)]ļv(k,k +1) — v(k,n) 
Sok) EN 


ES, (k + 1)) = = $,(k) 


completing the proof. 


20.13.4 Short-rate modelling 


We now deal with a problem that is different from that in previous sections. Instead of being 
given the asset prices, and asked if the market is arbitrage-fee or not, we are given some prices 
and want to determine other prices to satisfy the arbitrage-free condition. 

For example, refer back to Equation (2.6). This shows in effect that in an deterministic 
setting, the prices for bonds of one period determine those for all periods. We do in fact have 
a stochastic version of this formula obtained by taking expectations. 

Suppose we have a probability measure Q on Q and random variables v(0, 1), 
v(1,2), ... v(N — 1, N), where v(k,k + 1) is k-determined. We extend this to all our prices 
by the rule that for k « n, 


Wk, n) = E,[v(k, k + D)v(k + 1,k + 2)v(k + 2, k + 3) ...v(n— 1, n)]. (20.28) 
This will indeed result in an arbitrage-free market, since 


v(k, k+ DED( + 1,2)] = v k DELE D + Lk +2)... v(n — 1,0)]] 
= Wk, k + LIE, [vk + Lk + 2) ... v(n — 1,n)] 
E,[Dv(k, k + Dv(k + 1, k + 2) ... v(n — 1,n)] 
= v(k,n). (20.29) 


establishing condition (20.27). Here we used the definition (20.28) in the first and last equality, 
For the second inequality we used Theorem 20.6(c) noting that for any k-admissible sequence 
v, the set v? is a disjoint union of sets v? where each v; is a k + 1-admissible sequence — 
simply add on all possible choices for the last element. So as k increases, the partitioning by 
k-admissible sequences gets finer. Finally, we use Theorem 20.6(b) for the third inequality. 
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Example 20.9 Take N — 3. As remarked above, we need only consider a two-period 
model. We will take the binomial model, and let Q be the probability measure that assigns 
equal probability of 1/4 to each of the four elements in Q = (UU, UD, DU, DD}. Define the 
random variables v(0, 1), v(1, 2) and v(2, 3) by v(0,1) = 0.7, v(1,2)(U) = 0.8 v1, 2)(D) = 
0.6. v(2, 3)(UU) = 0.9, v2, (UD) = 0.7, (2, (DU) = 0.7, v2, (DD) = 0.5. 


(a) Use (20.28) to find the distribution under Q of the other random variables v(k, n) for 
k « n, which will make the market arbitrage-free. 


(b) Show that it is not necessarily true that v(k, n) = v(k, m)E,v(m, n) if m z k +1. 


Solution. (a) The other random variables are v(0, 2), v(0, 3), v(1, 3) which we will calculate 
directly from Equation (20.28) 


v(0, 2) = E[v(0, Dv(1,2)] = 70.7 x 0.8 + 0.7 x 0.6) = 0.49. 


The 2-determined random variable v(0, 1)v(1, 2)v(2, 3) takes the value of 0.7 x 0.8 x 0.9 
on UU, the value of 0.7 x 0.6 x 0.7 on DU, the value of 0.7 x 0.8 x 0.7 on UD, and the value 
of 0.7 x 0.6 x 0.5 on DD. Each of these paths has probability 1/4, so 


v(0, 3) = E[v(0, 1)v(1, 2)v(2, 3)] = — [0.504 + 0.294 + 0.392 + 0.210] = 0.35 


E 
4 
Now v(1, 2)v(2, 3) takes the value of 0.8 x 0.9 on UU and the value of 0.8 x 0.7 on UD so 
v(1,3)(U) = 5(0.72 + 0.56) = 0.64 
Also, v(1, 2)v(2, 3) takes the value of 0.6 x 0.7 on DU and the value of 0.6 x 0.5 on DD so 
vl, 3(D) = 10.42 + 0.30) = 0.36 


(b) «0, 2)E,[v(2, 3)] = (0.49)(0.7) = 0.343 + v(0,3) 


To tie this example in with previous material, the reader may find it instructive to reproduce 
the figures shown in Figures 20.7 and 20.8. Figure 20.7 shows all the asset prices for all four 
assets, Figure 20.8 shows the values of $,(n) for k = 1,2,3. The values of So(n) are of course 
all equal to 1. 

Instead of specifying the one-period bond prices, we could equivalently have specified 
the interest rates as we did in Chapter 2, except that now i, = v(k,k + 1)*! — 1, which is a 
k-determined random variable. 

We have shown that we can model stochastic interest rates in much the same way as we 
did in the deterministic model, that is, by choosing the one period rates first. For this reason, 
this procedure is sometimes referred to as short-rate modelling. What may be surprising at 
first, is that in the stochastic case the result in far from unique. We can indeed specify any 
probability measure we want for each i,, and achieve an arbitrage-free bond market. The 
procedure does not tell us how to choose the random variables i, and the probability measure 
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0.561, 0.81, 1, 0.9 


0.771, 1, 0.8, 0.64 


0.56, 0.8, 1, 0.7 


1, 0.7, 0.49, 0.35 


0.421, 0.67, 1, 0.7 


0.71, 1, 0.6, 0.36 


0.421, 0.67, 1, 0.5 


Figure 20.7 Example 20.9. Values of S, (n), k = 0,1,2,3 


Q. In practice, this can be motivated by trying to satisfy other conditions that one wants to 
impose. For example, at time 0, one knows the values v(0, n). But as we saw in Example 22.7, 
the values v(0, n) are determined from the one-period rates. A problem of some interest, which 
we will not deal with here, is to choose the distributions of the i, so as to recover specified 
values of v(0, n). 


20.13.5 Forward prices and rates 


We now expand on our definition of forward bond prices, which we introduced in Section 2.12 
Suppose we are given a financial bond market as above. For j < k < n, we let 


0.7, 0.56, 0.464 


0.7, 0.56, 0.448 


0.7, 0.56, 0.372 


0.7, 0.049, 0.35 


0.7, 0.42, 0.294 


— 0.7, 0.42, 0.210 


Figure 20.8 Example 20.9. Values of SQ), k=1,2,3 


0.7, 0.42, 0.252 
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Vi(k, n) = the forward price to be paid at time j for a 1-unit zero-coupon bond issued at 
time k and maturing at time n. 

This will be a j-determined random variable, since it is random depending on the state of 
nature up to time j. The quantity (x, n) defined in Section 2.12 could be labeled as p(k, n). 

The same argument as used in Section 2.12 shows that to prevent arbitrage, we must have 


DRESS 
DEED 


Indeed, if there is any sample point at which there is a discrepancy between these two random 
variables, an individual could wait and see if this particular sample point materialized at time 
j. which would be known by the fact that the random variables are j-determined. One could 
then follow one of two strategies, depending on which way the inequality went, to achieve a 
sure profit at time n. 

In particular vC j,n) = v(j,n) which is clear from the definition. 

In place of the forward prices, we can describe the information available from the bond 
prices at time j by interest rates as we did before. For j € k, we define a j-determined random 
variable i(k), the forward interest rate for contracts at time time j applicable to time period k 
tok+ 1, by 


i(k) = Gk D = 1 


Note that i,(k) = i,. In the deterministic case i(k) = i, for all k. 
It follows immediately from the definitions that 


v(k, n) = v,(k, k + Dv, T Lk T 2) nee V. OR = 1,n) = [(1 + inp + aal) DNE (1 + is Er 
(20.30) 


so that bond prices are determined by the forward interest rates. This suggests an alternative 
procedure to short-rate modelling for bond prices. The method is to model all the forward 
interest rates in place of the short rate i,. In this case, the forward rates at time 0 are all known, 
so the model will clearly reproduce the observed prices at time 0. One cannot, however, choose 
the distributions of the forward rates arbitrarily as for the short rates. Certain conditions must 
be imposed to ensure that the model is arbitrage-free. (See Exercise 20.12). This is another 
topic that is beyond our scope for further elaboration. 


20.13.6 Observations on the continuous time bond market 


We have concentrated on the discrete time setting for bond market, but many applications 
involve a continuous time framework. In this section, we give a very quick overview. It is 
intended mainly to bridge the gap and prepare those who wish to look at other sources and 
compare the material there with what we have done above. 

Our market now will consist of zero-coupon bonds maturing at any time f, with a price at 
time s, of v(s, t). This will be a s-determined random variable under an appropriate extension of 
this definition to cover continuous time. As well we can define an extension of the conditional 
expectation E,. We will not go into details. 
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In place of the forward and spot rates of interest, one often wants to deal with the 
corresponding forces of interest, generalizing the quantity 6(f) to a stochastic setting. The 
forward force of interest, for contracts at time s applicable to time f is given by 


3 ð 
6,() = -5 log vs. 0). 


an s-determined random variable. 
In the deterministic case, when we can assume that our basic Equation (2.1) holds, then 
log(v(s, t) = log v(t) — log v(s) and all forward forces will be equal to ó(t) as defined in (8.3). 
Knowing the forward forces determines bond prices uniquely, since by the definition 


v(s,t) =e” 5 à, (dr. 


which we can view as a stochastic version of Equation (8.4) and as a continuous version of 
Equation (20.30). 

We can also define a continuous and stochastic version of the short rate. Let 6(t) denote 
6,(t). This perhaps requires some care, since the partial derivative applies only to the second 
variable. Precisely 


bium log v(t, t 4- mr 
h^0 h 

which will be a ¢ determined random variable, and one which agrees with our Chapter 8 

definition in the deterministic case. As in the discrete time case, this will not uniquely 

determine bond prices in a stochastic setting. Given any stochastic process for 6(t), we can 

produce an arbitrage-free bond market by taking 


v(s, t) = Ele d(r)ary 


under a suitable interpretation of an integral of a random integrand. 


Notes and references 


This chapter constitutes a basic introduction to financial markets. For more comprehensive 
coverage of the basic concepts, see Hull (2014) or McDonald (2012). Readers particularly 
interested in the mathematics of continuous-time models can consult Bjork (2009) or Etheridge 
(2002) for sources that are not overly advanced. 

For many years, formula (20.19) was referred to as the Black-Scholes formula, recognizing 
the authors of the original paper that was published. Recently, many writers have added the 
name of Robert Merton, who made important contributions to developing and extending the 
ideas behind this result. 

A proof of the extension result on hyperspaces can be found in Steland (2012), Theo- 
rem 2.4.5. 

Those reading the financial literature should be aware that the terminology can differ from 
the actuarial conventions that we have adopted. Financial economists often use the word rate 
of interest to mean the continuously compounded rate, which we have called the force of 
interest. 
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Exercises 


20.1 


20.2 


20.3 


20.4 


20.5 


20.6 


20.7 


20.8 


20.9 


A one-period financial market has in addition to the bank account, one risky asset. 
The price at time 0 is 50, and at time 1, the price will either be 52 or 55 each with 
positive probability. For what value of the risk-free interest rate r will this market be 
arbitrage-free? 


For the option contract introduced in Section 20.5, find an arbitrage opportunity if 
the option price is 11. 


For the option contract introduced in Section 20.5, find the price if r is changed to 
(a) 11%, (b) 9 %. Can you explain why the prices change in this way? 


Show that a trading strategy is self-financing if and only if, for 0 < n < N, 


M 
Vin + 1) -= Vin) = ¥ a,(n\(S(n + 1) — SQ). 
j=0 


Assume the data of Example 20.2. Find the self-financing trading strategy for a 
2-year American put option with a strike price of 106. Compare the price with the 
corresponding European put option. 


Consider the financial market with N= 1, M=1, r=0, S,(0) = 100, S,(1) = 
130, 110, or 80, all with positive probability. 


(a) Find all possible risk-neutral probability measures. 


(b) Consider a 1-year call option with a strike price of 100. (This is known as an 
*at-the-money' option). Find the range of possible prices for this option that will 
avoid arbitrage. 


(c) If the price of the option in part (b) is 12, find an arbitrage opportunity. 
Consider the two risky asset market of Figure 20.6 with r = 0; 


(a) Use Theorems 20.4 and 20.5 to decide whether or not this market is (1) arbitrage- 
free, (ii) complete. 


(b) Describe the subspaces L and Lp. 


In Figure 20.3, find the slopes of the two lines representing Lo in terms of u, d 
and r. 


When pricing options with the Black-Scholes-Merton formula, describe how option 
prices change, as changes occurs in volatility, strike price, duration, and risk-free 
force of interest. Use the formula and put-call parity to compute put and call prices 
for 3-month European options, on a stock selling now for 100 with a strike price of 
110, 100 and 90 respectively, assuming o is 0.2 and the force of interest for a 1-year 
period is 0.06. Verify your conclusions above by repeating the calculations for other 
values of c, ó and duration. 
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20.10 


20.11 


20.12 


20.13 


20.14 


20.15 


20.16 


*20.17 
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For a certain stock, the price of a call option is 7.80 and the price of the corresponding 
put option with the same strike price and expiration date is 1.50. In each of the 
following scenarios, find the new price of the put option. 


(a) A change in volatility raises the call price to 11.30. 


(b) The stock price at time 0 is 100. A 10% increase in the strike price lowers the 
call price to 5.00. 


(c) The risk-free force of interest is doubled and the time to expiration is cut in half, 
which combine to lower the call price to 7.00. 


Consider a 6-month European call option on a stock now selling for 100, with a 
strike price of 97. You are given that o = 0.25 and the force of interest for a 1-year 
period is 0.10. You would like to replicate this option by a trading strategy involving 
holdings in the stock and risk-free zero-coupon bonds of face amount 97, maturing 
in 6 months. You plan to use the Black-Scholes-Merton formula. 


(a) What is the initial portfolio at time 0? 


(b) At the end of 2 months, the stock price has risen to 105. What is your portfolio 
now? 


(c) At the end of 4 months, the stock price has fallen to 95. What is your portfolio 
now? 


Give a direct verification for the Black-Scholes-Merton formula in the case that 
o —0. 


Verify directly from Figure 20.8 that the market of Example 20.9 is arbitrage-free 
and complete. 


Redo Example 20.9 only assuming that the probability of an upward move is 3/4 
and that of a downward move is 1/4. 


A bond market has two transitions, U and D from time 0 to time 1. You are given 
the following forward prices: 


G) %0, 1) = $(1,2) = (2,3) = 0.7; 
(ii) $(1, 2) and v, (2, 3) both take the value 0.8 on U and 0.6 on D. 
(a) If Q(U) = Q(D) = 0.5, show that Equation (20.27) is not satisfied. 


(b) Show that there is no probability measure Q for which Equation (20.27) is 
satisfied. 


Consider the bond market with prices as given in Figure 20.7. Consider a call option 
on a bond maturing for 1000 at time 3, with expiration date time 2. Find the price of 
the option if the strike price is (a) 900, (b) 650. 


A universal life contract provides that your account will be credited with a minimum 
of 4% interest, up to a maximum of 10% interest. Describe the options present in 
this arrangement. 


Part IV 
RISK THEORY 


21 


Compound distributions 


21.1 Introduction 


In earlier parts of this book, we concentrated on the present value of the benefits paid on a 
single insurance or annuity contract. The insurer, of course, is interested in the total benefits 
paid on an entire portfolio of policies. An obvious way to handle this is simply to obtain 
the present value of the total amount paid on all policies in the portfolio, as the sum of 
the individual random variables. This is known as the individual risk model. There is another 
method for estimating the total amount paid on a group of policies, known as the collective risk 
model, which has advantages in certain cases. In this chapter, we deal with a static version of 
this model, covering a 1-year time period. Chapter 23, concerned with ruin theory, will involve 
a dynamic multi-period version of the collective risk model. The combined subject matter of 
these chapters has traditionally been referred to as risk theory in the actuarial literature. The 
collective risk model is particularly useful for casualty insurance such as automobile, home, 
or health policies. The following are three main ways in which such contracts differ from life 
insurance: 


1. In a given period, there can be several claims under a single policy. Clearly, you can 
have several accidents or several visits to the doctor, even in a relatively short period. 
However, no matter how long the period, you can only die once. 


2. The amount of each claim can vary substantially. A collision claim under an automobile 
policy can range from a small amount for a dented fender, to the complete cost of the 
vehicle. A health claim may involve a single visit to a doctor, or it may involve 
prolonged treatment, drugs, and hospital care costing a large amount of money. By 
contrast, although the amount paid on a life insurance policy can vary by time of death, 
we do not have variation of the amount for different ‘kinds’ of dying. 


Fundamentals of Actuarial Mathematics, Third Edition. S. David Promislow. 
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3. Such contracts are usually written for a relatively short period such as a year, and are 
then renewed if the insured wishes to continue. They do not have the long-term nature 
of the life contracts we have discussed. Consequently, the effect of interest is not so 
important. To simplify the mathematics, the effects of interest will be ignored in the 
models of the subsequent chapters. 


The collective risk model views total claims as a compound distribution, which we will 
now examine. To motivate the idea, consider the following game. Toss two coins, and for 
each head that comes up, throw a die. What is the distribution of the total? First, we identify 
the range. The possible totals can range from 0, which occurs if you toss two tails, to 12, 
which occurs if you toss two heads, and get a 6 on each of the two throws of the die. After 
an elementary but somewhat tedious calculation, we can arrive at the following distribution, 
which the reader should verify before proceeding any further. The probabilities of 0 to 12 
respectively, in multiples of 1/144, are 36, 12, 13, 14, 15, 16, 17, 6, 5, 4, 3, 2, 1. 

We could complicate this problem tremendously. Instead of two coins, toss 1000. Instead 
of a simple die throw, choose a much more complicated random variable, possibly one with a 
continuous distribution. It may become impossible to actually calculate the exact distribution 
as we did above, but we still may want to say something. At the very least we want to compute 
the mean and variance of the resulting distribution, or possibly higher moments. We may want 
to calculate the moment generating function. We may be able to find a known distribution that 
closely approximates the one we are interested in. 

What is the relation of this game to insurance? The collective risk model identifies two 
main factors that influence the total claims. One is the claim frequency, that is, the number 
of claims that will occur over a certain period. This will be a discrete random variable taking 
nonnegative integers as values. The second factor is the amount that will be paid, given that a 
claim has occurred. This is known as the severity of the claim. We have observed that in any 
fixed period under a life insurance policy, the claim severity is normally just a constant, but 
under other types of insurance, it will vary substantially. Even though the insurer is ultimately 
interested in the total payout, it has been found advantageous to first model frequency and 
severity separately and to then combine the results to determine total claims. One reason 
for this is that changed conditions can affect these factors in different ways. For example, 
requiring automobile passengers to wear seat belts has little effect on the frequency of car 
accidents, but it certainly tends to reduce the claim payments for personal injuries. On the 
other hand, the introduction of daytime headlights is likely to have little effect on the severity 
of claims, but it might well reduce the number of accidents. Another example involves the 
effect of seasonal differences. It may be that people drive faster during summer months, when 
the weather is better, so a typical summer accident is more serious than one in the winter. By 
contrast, one might well expect more accidents in the winter. 

We will now describe the formal model. We have a fixed period, a collection of policies, 
and we want to predict S, the total claims from all policies over that period. We let N denote 
the frequency of claims and X the severity, both of which we model as random variables. 
Throughout the discussion, we make some standard assumptions, which are reasonable in 
most insurance situations, although they may not always hold exactly. We assume that the 
severity of claims is independent of the frequency and that the severity of any one claim is 
independent of that of others. We also postulate that the severity follows the same distribution 
over the period. This will normally hold for a sufficiently small period, although it may not 
for longer periods due to seasonal differences such as that alluded to above. 
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Let X; denote the amount of the ith claim. Our last assumption says that there is a single 
severity distribution given by a random variable X, and we assume that each X; is distributed 
as X. The total amount of claims is then simply given by the independent sum 


S=X, +X% +- + Xy. (21.1) 


We have encountered sums of random variables before, but the above is quite different, 
since the number of summands is random, rather than a fixed integer. 

We can now observe that the simple coin-dice problem we mentioned at the beginning is 
really an example of this type, where N takes the values 0, 1, 2 with respective probabilities 
1/4, 1/2, 1/4, and X takes the value 1,2,3, ...,6 with equal probabilities. 

The distribution of S is known as a compound distribution, which is usually prefaced by 
referring to the distribution of N. For example, if N is Poisson, we call S a compound Poisson 
distribution. This comes from the fact that when X is a random variable that takes the value 1 
with certainty, then the resulting compound distribution is just N itself. 

To summarize the goals in this chapter, we will be given N and X, and our object is to 
investigate the compound distribution S as given by (21.1). We will denote this distribution 
by the symbol (N, X). 


21.2 The mean and variance of S 
As mentioned, although it may be difficult to calculate the distribution of S exactly, it is 
quite simple to find its mean and variance, given the corresponding quantities for N and X. 


Throughout this chapter, we will let p denote the probability function of N. From the law of 
total expectation (A.29) 


E(S) = br E(S|N = n)p(n). 


When N =n, S is just the sum of n independent copies of a random variable with the 
distribution of X. By using the fact that N and X are independent, we can write 


E(S|N = n) = nE(X|N = n) = nE(X), 
leading to 
E(S) = E(X)E(N), (21.2) 
an intuitively obvious result. Similarly, 


E(S*) = DY EG?|N = npn. 


The second moment of any random variable is the sum of the variance plus the square of 
the mean, so that 


E(S?|N =n) = Var(S|N = n) + [E(S|N = n) = nVar(X) + n7(E(X))’. 
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We use here the fact that the variance of a sum of independent random variables is the sum of 
the variances. The last two formulas yield 


E(S?) = E(N)Var(X) + E(N*)E(X)’, 
and subtracting the term E(S = E(N)*E(X)*, we obtain 
Var(S) = E(N)Var(X) + E(X)?Var(N). (21.3) 
There is an informative explanation of the above formula. Variance represents uncertainty, 
and this decomposes the uncertainty in the value of S into two parts. The first term gives the 


uncertainty resulting from the severity, and second term gives the uncertainty arising from the 
frequency. 


21.3 Generating functions 


The same conditioning technique as used above can be employed to deduce the moment 
generating function (m.g.f) M«(r) and the probability generating function (p.g.f.) Ps(t) (see 
Sections A.9 and A.10). When N = n, we have that $ = X, + X» + + +X, and so 


Ele |N = n) = Efe tt Xn] = Efe e |. en] = E(e™)E(e™) ... E(e®), 


where we invoke independence in order to write the expectation of a product as a product of 
expectations. Since each X; is distributed as the random variable X, 


E(e?|N = n) = My)", 


and 


M(t) = È, EIN = npin) = È, Mx"), 


n=0 n=0 


from which we conclude that 
Ms(t) = Py(My(t)). (21.4) 
When X takes nonnegative integer values, we can use A.34 and A.35 to conclude that 
Ps(t) = Ms(log f) = (My(log D) = Py (Py (e ^*^), 
giving us the nice result that 


P(t) = Py(Px@). (21.5) 
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21.4 Exact distribution of S 


In the previous two sections, we considered the problem of getting partial information about 
S through moments and generating functions, but it is natural to ask if we can find the exact 
distribution of this random variable. The answer is that it is easy enough to write down a 
formula for this, but in all but some very simple cases, it is not at all easy to actually use the 
formula to calculate numbers. 

What is the probability that S takes a value less than or equal to s? Once again we use the 
conditioning technique. If N = n, then the answer is just the probability that X, + X5 + = + 
X, € s, which is just the n-fold convolution of Fy at the point s, which we have denoted by 
F*" (s) (see Section A.12). It then follows that 


Fs(s) = 3 po)Fy G), (21.6) 
n=0 
or similarly, by using density/probability functions, 
fs(s) = È pof"), (21.7) 
n=0 
where f*°(k) takes a value of 1 for k = 0 and zero elsewhere, and f*! is just f. 


Example 21.1 Suppose that N takes the values 0, 1,2 with probabilities 0.5, 0.3, 0.2, 
respectively, and X takes the values 1,2,3 with probabilities 0.4, 0.2, 0.4, respectively. Find 
the distribution of S. 


Solution. We form the following table, in which for each row, the entries are multiplied by 
the weights in the bottom row to get the totals in the far right hand column: 


k FOV FOW FOV fs) 
0 1 0 0 0.500 
1 0 0.4 0 0.120 
2 0 0.2 0.16 0.092 
3 0 0.4 0.16 0.152 
4 0 0 0.36 0.072 
5 0 0 0.16 0.032 
6 0 0 0.16 0.032 
Weights 0.5 0.3 0.2 


21.5 Choosing a frequency distribution 


Given a certain portfolio of insurance policies, how does the insurer select appropriate dis- 
tributions in order to model the aggregate claims S? In this section, we focus on the claim 
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frequency N. We could conceivably do this by strictly empirical means. We might use past 
data from similar policies to try to estimate a distribution. This is a statistical problem that 
we do not concentrate on in this book. There are, however, many advantages to choosing a 
distribution from one of several well-known families of discrete distributions. We then have 
nice mathematical expressions for the distribution. These families are based on one or more 
parameters. The estimation procedure is confined to choosing just these parameters from 
the observed data rather than the entire distribution. Three families that play major roles in 
modelling claim frequency are the binomial, Poisson, and negative binomial distributions. 
Details and notation are given in Sections A.11.1, A.11.2 and A.11.3 respectively. 

Is the binomial a suitable distribution for claim frequency? To illustrate, suppose that our 
period of time is 30 days, and our observed data show that on average we can expect 10 
claims over each 30-day period. Suppose also that we now assume a time homogeneity for 
frequency, which is analogous to the assumption for severity that we made as part of our 
general postulates. That is, we assume that the rate of claims remains constant over the period. 
(This assumption may not be completely realistic in certain cases. For automobile insurance, 
for example, there are more chances of an accident occurring during the rush hour than in the 
middle of the night.) As a final condition, suppose we assume that we will get at most one 
claim per day. We then can look upon this as 30 repeated trials. Each day we either get a claim, 
which constitutes ‘success’, or no claim which constitutes ‘failure’. In order that the expected 
number of claims equals our estimated value of 10, we must take the probability of a claim 
each day to be 1/3. So indeed, under our assumptions, we can model N by Bin(30, 1/3). 

However, what if we decide that our limit of one claim a day is not really an accurate 
assumption, and that we may well experience more on some days? We could get a more 
accurate model by assuming that there would be no more than one claim each half-day period. 
We would still get a binomial distribution, but now with m = 60, and we have to change 
p to 1/6 to preserve an expectation of 10. Perhaps, however, even this half-day limitation 
is not quite accurate, and we should replace it by an hour, or perhaps a minute, or even a 
second. In fact, why not allow complete freedom and take the limiting distribution? This leads 
immediately to the the Poisson distribution which is one of the most common distributions 
used for modelling claim frequency. It arises in a natural way from our independence and 
time-homogeneous assumptions, by taking the limit of binomials. 

It is worthwhile to note that the variance of a compound Poisson distribution has a 
particularly simple form. If S ~ (N, X), where N ~ Poisson(A), then from A.42 , E(N) = 
Var(N) = A, so that 


Var(S) = AE(X?). (21.8) 


The negative binomial is also a popular choice for modelling insurance claims, but the 
reason is not immediate. It stems from the following idea. Suppose we assume that each 
insured individual produces claims according to a Poisson distribution, but that the parameter 
of this distribution can differ according to this individual. For a simple example, suppose that 
automobile drivers are classified as either good or bad. Assume that the claims of the good 
drivers are distributed as Poisson(1), while the claims of the bad drivers are distributed as 
Poisson(2). Assume, furthermore, that 60% of drivers are good and the rest are bad, but that 
the insurer has no way of distinguishing between the classes. It would then be reasonable to 
model the claims frequency N as a mixture of the two distributions Poisson(1) and Poisson(2), 
with respective weights 0.6 and 0.4 (see Section A.13). Of course, this is oversimplified and 
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we could strive for a more sophisticated model by considering a mixing distribution involving 
several different values. We may even consider letting the parameter vary continuously over 
an entire interval of positive numbers and take a continuous mixing distribution. Remarkably, 
it turns out that if we take the gamma distribution (see Section A.11.6) for this purpose, our 
mixed Poisson is a negative binomial. Precisely, if we want a mixture of random variables 
N, ~ Poisson(A), in which A has a Gamma(a, f) distribution, then the resulting mixed 


distribution N satisfies 
p "n A2! e724 eas 
I(a) Jo k! 


$ [ jt erà Dg, = B" T@tk) l 
IXa)k! Jo I (a)k! (f + 1e** 


fnk) = 


Applying (A.53) repeatedly gives 


Tla +k) 2 (a--k— DI(a -k— 1) 2 (a k— D(a -k-2) (a -k—2) — ... 
= (a - k — D(a +k — 2)... aT(a). 


By comparing with (A.45) we see that N ~ Negbin(a, (f + 1)~!). 


21.6 Choosing a severity distribution 


What distributions are suitable for measuring claim amounts? For many types of insurance, 
claims can assume a large number of values, and it is usually convenient to model claims by 
a continuous distribution. We will discuss some possibilities. See Sections A11.5—A11.8 for 
details and notation. 

Is a normal distribution a suitable one for modelling severity? A possible drawback is that 
the claim distribution will almost always be positive-valued (it is not usual for the policyholder 
to pay the insurer), and the normal of course takes values over the entire real line. This by 
itself is not a major concern. If the mean of a normal is sufficiently high relative to its standard 
deviation, there will be so little chance of a negative value that for all practical purposes, we 
may as well consider it as positive-valued. 

A more major difficulty is that the normal density is not the right shape for most applica- 
tions, as it does not give sufficiently high weight to lower valued claims. We usually want a 
distribution that has a greater concentration of mass on the left. (This could be described as a 
distribution with the mean greater than the median.) The family of gamma distributions does 
have this general shape that we want, and provide a popular choice for severity modelling. 

Another important criterion for selecting a severity distribution is tail behaviour. For any 
distribution X, the function sy(t) = P(X > t) approaches 0 as t approaches oo. However, the 
rate at which convergence to 0 occurs will differ. We say that the distribution X has heavier 
right tails than the distribution of Y if sy(1)/s (t) approaches 0 (or equivalently, by L' Hópital's 
rule, if fy(t)/fy (t) approaches 0). A heavier-tailed distribution therefore gives more weight to 
large values. If one desires a heavier-tailed distribution than the gamma, the Pareto distribution 
(see Section A.11.8.) is a possible choice. Using L' Hópital's rule to find the limit of the ratio 
of density functions, we can verify that this is heavier-tailed than any gamma distribution. The 
heavy-tailed feature of this distribution is further revealed by the fact that for large enough k, 
the kth moment becomes infinite. 
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21.7 Handling the point mass at 0 


When X is discrete, then S is clearly discrete, but what happens when X is continuous? 
Provided that N takes the value 0 with positive probability, we will have a distribution of 
mixed type. Since S = 0 whenever N = 0, the distribution for S will have what is known as a 
point mass at the point O. It is often convenient to split off the continuous part. Let 


S* denote the random variable S |S > 0, 


(see Section A.8) which will be continuous when X is.( Note that this is different from 
S, = S|S > 0). Then S can be considered as a mixture of St and 0 (the random variable that 
always takes the value 0), with respective weights 1 — p(0) and p(0). It follows from (A.67) 
that 


Fs(s) = p(0)P(0 < s) + (1 — p(O) Fs s), s>0. 


Since the zero random variable is always less than or equal to s, 


Fs(s) — p(0) 


Fee) 7, T pO 


Similarly, since the m.g.f. of the zero random variable is identically equal to 1, the m.g.f. of 
St is given by 


M;(t) — p(0) 
1 — p(0) 


Example21.2 Suppose that N has a Geom(p) distribution and X has an Exp(A) distribution. 
What is the distribution of S? 


Ms«(t) = 


Solution. For a general severity distribution X, 
Mg(t) = Py(My(0)) = (1 — p) — pMy()", 
and since p(0) = 1 — p, 


1 — pa — pMy,(Q)»-! —(1— My(t 
Maga! Py TM ( P -a-»(ITAR) 


By substituting My(t) = A/(A — t) from (A.56), we have 


AQ — p) 


Ms«(t) = XIIe 


which from (A.56) again is the m.g.f. of an Exp(A(1 — p)) distribution. Invoking Theorem 
A.2, we know that the distribution of S is a mixture of 0 and an Exp(A(1 — p)) distribution 
with weights 1 — p and p, respectively. 
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21.8 Counting claims of a particular type 


21.8.1 One special class 


Suppose we divide our claims into two groups, ‘special’ and 'non-special, and we are 
interested in knowing the number of special claims, as well as the total. Denote the special 
claim frequency by Nj. Our problem is to deduce the distribution of N, given the distribution 
of N. The special claims can be anything at all — those of high amount, those of low amount, 
those divisible by 79. It does not really matter. All we need to know is the probability of 
a special claim and we can write down a formula for the distribution of N;. What is the 
probability that N, = k, given that the probability of a special claim is x? In order to get k 
special claims, there must be at least k claims in total. That is, N must take the value of k + r 
for some nonnegative integer r. From the law of total probability (A.30), 


P(N, =H= by P(N, =KIN 5 k  r)p(k 4 r). 
r=0 


Given that we have k + r claims in total, the number of special claims out of these is distributed 


as Bin(k + r, z), so we can substitute in the above to get 


k 
PN =k) = ye kd — rpk +n). (21.9) 


Example 21.3 N takes the values 0, 1, 2, 3, 4 with probabilities 0.3, 0.1, 0.3, 0.2, 0.1 
respectively, and X takes the values | to 100 with equal probabilities. What is the probability 
that we have exactly 2 claims for an amount less than or equal to 60? 


Solution. The special claims are those for an amount less than or equal to 60, so z = 0.6. 
PIN, =2)= 0.6°(0.3 + 3x 0.4 x 0.2 + 6 x 0.47 x 0.1) = 0.228 96. 


While formula (21.9) may be good for calculating individual probabilities, it can be tedious 
to calculate the entire distribution. It is often better to proceed by calculating the p.g.f. of Nj. 
We do this by making the ingenious observation that N} is itself a compound distribution. In 
fact, N, ~ (N,6,), where 6, takes the value 1 with probability z and 0 with probability 1 — zr. 
This is clear, since we can count special claims by simply assigning a value of 0 whenever 
we get a non-special claim. We note that 6, is a Bernoulli random variable, that is, it has a 
binomial distribution with m = 1 and, therefore, its p.g.f. is 1 — zx + zt. From (21.5), 


Py, (t) 2 Py — x 9 zt). 


This formula allows us to show that for our three major counting distributions, Poisson, 
binomial, and negative binomial, Nj is of the same type as N, but with a changed parameter. 
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If N ~ Poisson(A), then 


Py (t) = eMl-atat-1) — gAa(t-1) 
1 5 


showing that N, ~ Poisson(Az). 
If N ~ Bin(m, p), then 


Py) 2 (1- ptpl—z4 at)!” = (1 — pz 4 pzt)", 


showing that N, ~ Bin(m, pz). 

The last case is a bit tricker. Suppose N ~ Negbin(r, p). Can we expect N; to be a negative 
binomial with changed parameters? Motivated by the binomial case, we might try leaving r the 
same and modifying p. We cannot, however, take zp for the new value of p since that would 
not give us the correct value of zp/(1 — p) for E(Nj) that we must have by formula (A.46). 
We will, however, at least get the right mean if we multiply a by z, where a = p/(1 — p) is the 
alternate parameter to p mentioned in Section 21.5. This indeed is the right answer. To verify 
this, it is convenient to express the probability function of N in terms of this new parameter. 
We can write 


1— 
ro- (5 pt 


—<f 
) =(1+a-at)’ =[l-a(t—-1)]”. 
So if M is the negative binomial with the first parameter r and the second (modified) parameter 
az, then 


Py, (© = Py(l — x + at) = [1 — alat = zx) = Py). 


Therefore, reverting to our original parameter, 


i pr 
N, ~ Negbin| r, ————— |. 
i z ( s) 


21.8.2. Special classes in the Poisson case 


Suppose now that we have two special classes, with the number of special claims in the two 
classes denoted by N, and N}, respectively. We can write down a formula similar to (21.9) for 
the joint distribution of N, and M3. If the probability of a claim in the first class is z, and that 
of a claim in the second class is z, then 


e (kd m4 r)! 
PN, =k and NÑ, =m) = 3 — — — 
- k\m!r! 


aia — z, — m) p(k - m 4 r). 

What do you suppose the covariance of N, and N, will be? We would naturally expect this 
to be negative. A high value for one type of claim would seem to indicate that there are fewer 
of the other type. Indeed, this will be the case for most distributions, but remarkably, for the 
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particular case where N is Poisson, the two distributions are independent. If N ~ Poisson(A), 
we obtain from the above 


À Aktmtr 


yt. 


P(N, =k and N, = m) Emi 


iE DIN TUI 


e nn e IU" S gotten np ME mr 


k! m! r! 


r=0 
The third term above equals 1, since it is the sum over all nonnegative integers of the 
probability function for a Poisson(A(1 — z, — m2)) distribution. We know from above that 
N; ~ Poisson(Az;) for i = 1,2, so 


and the independence result is proved. We can similarly show that if we have r special classes, 
with N, denoting the number of claims in class r, then the collection (N4, N3, ... , N,) will be 
independent. 

This allows us to write certain compound Poisson distributions in an alternate form, which 
is sometimes useful. Suppose that X takes finitely many values, say x,, x», ... , x,. Let z; denote 
the probability that X — x;. Let N; be the number of claims for amount x;. We can then write 


S: Ya 
i-l 


This does not use the fact that N is Poisson and is true for any frequency distribution. The 
problem is that, in general, this formulation is of little use. We may not be able to easily 
identify the distribution of the various N; and, even if we can, such as in the binomial or 
negative binomial cases, they will not be independent. In general, it can be very difficult to 
deal with dependent sums. In the Poisson case, however, we know that N; ~ Poisson(Az;) and 
that the N; are independent. The above expression is often easier to deal with than formula 
(21.7) if we want to compute the exact distribution for the compound Poisson distribution 
where X is finite-valued. We need only compute a single n-fold convolution. 


21.9 Thesum of two compound Poisson distributions 


The sum of two independent Poisson distributions is itself Poisson distributed as shown by 
Example A.1 in Section A.11.2. We now show that the same statement is true for compound 
Poisson distributions. Given two independent compound Poisson distributions, $, = (N;, Xj) 
and $5 = (N5, X2), where N, ~ Poisson(4,) and N, ~ Poisson(45), let $ = S; + $5. Then 


M,(t) = Ms, (Ms, (0) = e^ Mx; (0-1] e/2IMx, ©-1] 
Z e1 tad /Ar+42))Mx, ()+(a/(A A3) My, O71]. 


It follows that S ~ (N, X), where N ~ Poisson(A; + 45) and X is a mixture of X, and X, with 
respective weights A,/(A, + Az) and 45/(44 + Ad). 
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21.10 Deductibles and other modifications 


Up to now we have assumed that the insurer pays the totality of any loss as given by the 
random variable X. In practice, the insurer often only covers part of the loss, leaving the 
insured to pay the remainder. The prime motivation is to make the insured party partially 
responsible, so that they have an interest in taking steps to avoid loss. 

This has major implications when we look at the statistical problem of inferring details 
about loss distributions from the data furnished by insurers on their claims experience. In 
practice, this data will give the amounts actually paid on claims, rather than the actual losses. 
In order to estimate loss distributions, one needs to understand clearly the relationship between 
the amount of the loss and the amount actually paid under the common types of modifications. 


21.10.1 The nature of a deductible 


One of the most common modification devices is a deductible. Under this arrangement, the 
insurer only pays the losses that are above some amount d fixed in advance. The purchaser 
of the insurance pays for the first d units of loss, and of course if the loss is less than d, the 
insurer pays nothing, and the insured is fully responsible. This has an added advantage to the 
insurer of preventing an undue expense involved in processing small claims. 

If the original severity distribution is given by the random variable X, and there is a 
deductible of d, the amount actually paid by the insurer is 


X-d, o XI 
a-o,- 6 if X « d. 


It is not always easy to describe the exact distribution of this random variable given the 
distribution of X, but it is relatively simple to compute its expectation. This is given for 
continuous X by 


Ex -2,- [ a-a f sod (21.10) 


where the second expression follows from the first by integrating by parts (or directly from 
(A.15). 
Another random variable of interest, which is associated with the above, is 


d, if X 2 d, 


XAd- mino. = [5 ifX «d 


By looking separately at the case where X is less than or greater than d, it easily follows 
that, in general, 


X=(X-d),+(XAd), 
so that 


E(X — d), = E(X) — E(X ^ d). (21.11) 
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For continuous X, it follows from (21.10) by calculating directly that 


d 


d 
Bxnd = f shyla) de + dsd = f sy(x) dx. (21.12) 
0 0 


Example 21.4 X takes the values 100, 200, 300, 400 with probabilities 0.4, 0.3, 0.2, 0.1, 
respectively. Describe the distributions of (X — d), and (X Ad), for d = 230. Find E(X — 
230), and E(X A 230). 


Solution. (X — 230), takes the value 0 with probability 0.7, 70 with probability 0.2, and 
170 with probability 0.1, while X A 230 takes the value 100 with probability 0.4, 200 with 
probability 0.3, and 230 with probability 0.3. Calculating directly, 


E(X ^ 230) = 100 x 0.4 + 200 x 0.3 + 230 x 0.3 = 169. 
E(X — 230), = 70x 0.2 + 170 x 0.1 = 31. 
It is often convenient to compute E(X — d), from formula (21.12). This is particularly 


true when X is infinite and discrete. We can compute E(X A d) by a finite sum, as opposed to 
the infinite series involved in computing E(X — d), directly. Here is a typical example. 


Example 21.5 Suppose X ~ Geom(p). Find E(X — 1/2),. 


Solution. Since X is always greater than 1/2 unless it is equal to 0, we know that X A 1/2 takes 
the value 0 with probability 1 — p and the value 1/2 with probability p. So E(X ^ 1/2) = p/2. 
Since E(X) 2 p/1 — p, we conclude that 
P P 
E(X - 1/2), = —— - —. 


21.10.2 Some calculations in the discrete case 


Suppose that X takes integer values. The integrals involving fy in (21.10) and (21.12) must 
be replaced by summations with the same limits, and with fy now equal to the probability 
function. They apply only to integer values of d. The expressions involving sy, however, 
remain valid as is and apply to any value of d. Of course, in this case, sy is a step function 
and the integral may be rewritten as a sum. For example, if k € d « k + 1 for some integer k, 
then 


gx -,- f sy(x)dx = (k + 1 — d)sy(k) + sy(k + 1) + sy(k +2) + =. (21.13) 


To illustrate, look again at Example 21.4, except we take a unit to be 100, so now X takes the 
values 1, 2, 3, 4. We have s(2) = 0.3, s(3) = 0.1, s(4) = 0, and 


E(X — 2.3), 20.7 x 0.3 + 0.1 = 0.31, 


as above. 
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It is quite simple to compute all values of E(X — d), in the case of an integer-valued X. 
When d is an integer, (21.13) just says that 


co 


EX -4), = 23 sy (K), 
k=d 
and we get the recursion formula 
E(X — (d + 1), = E(X — d), — sy(d), (21.14) 


where we start the recursion with E(X — 0), — E(X). Noninteger values of d are computed 
exactly by linear interpolation, since (21.13) immediately implies that, for d = k + r where k 
is an integer and 0 <r < 1, 


E(X — d), = (1- DE(X — k), + rE(X — (k + 1),. 


Example 21.6 Suppose that the probability function of X takes the values f (0) = 0.2, f(1) = 
0.2, f(2) = 0.3, f(3) = 0.1, f(4) = 0.2. Find E(X — d), for d = 0, 1,2,3,4, and 2.6. 
Solution. We first calculate s(0) = 0.8, s(1) = 0.6, s(2) = 0.3, s(3) = 0.2, s(4) = 0, and 


4 


E(X) = > s(k) = 1.9, 


k=0 


so that E(X — 0), = 1.9, EX—1), =1.9-0.8= 1.1, EX — 2), =11-06=0.5, E(X — 
3), = 0.5 — 0.3 = 0.2, E(X — 4), = 0.2 — 0.2 = 0. This final value of 0 serves as a check that 
we have done the recursion correctly. Finally, we have 


E(X — 2.6), = 0.4E(X — 2), + 0.6E(X — 3), = 0.32. 


21.10.3 Some calculations in the continuous case 


In general, (X — d), will have a point mass at 0, as it will take a value of 0 with probability 
Fy(d). Therefore, if X is continuous, (X — d), will be a mixed distribution. In such cases, 
we may want to proceed as in Section 21.7 and consider the random variable (X — d)* = 
(X — d)|X > d. We then have that (X — d), is a mixture of 0 and (X — d)*) with weights 
Fy(d) and sy(d), respectively. It follows that 


E(X — d), = E((X — d)'sy(d). (21.15) 
Example 21.7 If X ~ Exp(A), what is the distribution of (X — d)*? 


Solution. If Y denotes X — dt, then 


SO Ed) e AOD og 
sx (d) e- 4d i 


Sy(y) = 
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so that Y has exactly the same distribution as X. This only happens with an exponential 
distribution. The result at first glance seems surprising. It says that no matter how high the 
deductible is, the excess of the loss over the deductible is distributed as the original loss. The 
point to keep in mind is that Y is conditioned on the loss being above the deductible, which 
of course will have a very small chance of occurring for high values of d. 


Example 21.8 If X ~ Pareto(0, a), what is the distribution of (X — d)*? 
Solution. If Y denotes (X — d)*, 
quc ED; (— m) /G&3) - (44). 
Sx(d) yt+td+0 d+0 y+d+0 
and we see that 


Y ~ Pareto (0 + d, æ). 


In general, we will not be able to easily identify the distributions associated with the 
deductible d as we did in the above two examples, although in some cases, we may be able to 
compute expectations. This is true for the gamma distribution with first parameter 2. 


Example 21.9 If X ~ Gamma (2, p), find E[(X — d)*] and E[(X — d), )]. 
Solution. Integrating by parts, 
[otras = re 
so that 
Sy(x) = p Ji ye PY dy = (1 + poe”. 
x 
From (21.13), 
E[(X — d),] = [a + pxyeP dx = + em = (2 + a) p. 


and from (21.15), 


EX —d)*] = 


EU(X-d),] 1 ee 
sd) — B \1+ pd) 


As a check, both these expectations reduce to E(X) when d = 0. 
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21.10.4 The effect on aggregate claims 


Up to now we have focused on the severity distribution X. We now turn our attention to the 
effect of a deductible on the aggregate claim distribution S. It is important to distinguish two 
situations. 

In one case, the deductible is applied directly to aggregate claims. This could arise from 
a reinsurance arrangement. It is often the case that a party known as a reinsurer agrees to 
cover part of the losses of the original insurer in return for a premium. In one common type 
of arrangement, the reinsurer would impose a deductible on the aggregate claims for a certain 
portfolio. This is known as stop-loss reinsurance. In such a case, we are interested in the 
random variable (S — d), which will be the amount paid by the reinsurer. 

In the second situation, the deductible d is applied to each individual claim. There are 
two ways to proceed here. The obvious way is to simply note that in place of the distribution 
S ~ (N, X), which applies without the deductible, we are now interested in the distribution 


S' ~ (N,(X — d),). (21.16) 


There is an alternate representation for S’ that is useful with certain distributions. Let us 
motivate this by asking the following deep philosophical question. Is a claim for an amount 
of 0 really a claim? The simple answer is that it either is or is not, depending on which way 
you want it. In the first representation of S, there will be claims for zero amount, namely those 
that are less than the deductible and for which nothing is reimbursed. Suppose we decide 
not to count these as claims. Our severity will then be distributed as (X — d)* rather than 
(X — d),, since we now only consider a claim to have occurred if it is over the deductible. We 
must then, however, also change the distribution N to count only those claims for an amount 
above the deductible. We know how to do this from Section 21.8. The special claims in this 
case are those for an amount greater than d. Under this approach, we have 


S! ~ ((N,65,),(X—d)*), n= syd). (21.17) 


To use (21.17) effectively, we have to know that claim frequency and severity remain 
independent when we make these transformations. A general proof of this fact becomes 
somewhat involved in notation, and we will not present it, but the idea is straightforward as 
the following example illustrates. 


Example 21.10 Suppose that N takes values of 0,1,2, and and let p(i) be the probability 
that N = i . Suppose that X takes the values x4, x2, x3, where x, < d and x, and x; are greater 
than d. Let N, denotes the number of claims for an amount greater than d. Show that 


P[N, = 1 and (X — d)* = x; — d] = P[N, = I]P[(X — d)* = x — d] 


Solution. Let a, b, c denote respectively the probability that X takes the values x,.x», x3. 

For the event on the left to occur we need either one claim for an amount of x; or 
two claims, where one is of amount x; and the other is of amount x,. The required prob- 
ability then is p(1)b + p(2)2ab. Now, for N; = 1 we need either one claim for an amount 
of either x or x4, or two claims where one is for x, and the other is for either x, or x3. 
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So P[N, = 1] = p(D(b +c) + p2)2a(b + c). Moreover P[(X —2)* = x, — d] = b/b + c. 
Multiplying the last two quantities gives the first. 


Use of (21.17) in place of (21.16) works particularly well when N is one of three basic 
cases of Poisson, binomial, or negative binomial, where we know what (N,6,) is, and when 
we know what (X — d)* is, as in the exponential or Pareto distributions for severity. 


Example21.11 Suppose that N ~ Poisson(2) and X ~ Exp(3). A deductible of 2 is applied 
to each claim. Find the variance of the resulting distribution of aggregate claims. 


Solution. We know that (X — 2)* has the same distribution as X and therefore a second 
moment of 3 We replace the original N by a Poisson(2e 9) distribution, and by using (21.8), 


the resulting variance is ge 6. 


21.10.5 Other modifications 


Another method of modifying the original claim amount is to set a maximum value m. The 
insurer will pay at most m regardless of the actual value of the loss. If X is the original 
severity distribution, the amount paid on a claim would then be X ^ m. There could be both 
a deductible d and and a maximum m imposed. The amount paid on a claim in this situation 
would be X ^ (m + d) — X ^ d. Yet another modification is for the insurer to pay only a certain 
percentage of the loss. The amount paid on a claim will now be aX, for some 0 < a < 1. In 
doing calculations where all these modifications are present, it is useful to keep in mind the 
fact that 


a(X ^ d) 2 aX Aad. 


Example 21.12 A policy will cover 80% of all losses in excess of 100, with the further 
provision that a maximum payment of 900 will be made regardless of the amount of the loss. 
Express the amount paid on a claim with terms of the form X A d. 


Solution. The amount paid is 0.8(X — 100), provided that 0.8(X — 100) < 900, which will 
occur for X « 1225. We can express this as 


0.8(X ^ 1225) — 0.8(X ^ 100). 


21.11 A recursion formula for S 


21.11.1 The positive-valued case 


We suppose that X takes positive integers as values, and we seek a recursion formula to 
compute the probability function of S. It turns out that this is possible provided that the 
probability function of N satisfies a certain recursion. The required property is that for some 
constants a and b, 


pk) = (a 2) p - v. k=1,2,3,.... (21.18) 
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To develop our recursion formula, we first need some identities for convolutions of f = fy. 
By definition, we know that for all n and all positive integers x, 


x-1 
f") = 2^ Reb - i). (21.19) 


i-l 
There is, however, another curious identity relating convolutions. 


Proposition 21.1 For all n and all positive integer values of x, 


x-1 


*n nh eenen; 
KOED O OP y. 


i=1 


Proof. Let X4, X5, ... , X, be independent and each distributed as X, and let A = biam X;. 
Then, for any positive integer i < x, 


P(X, =iand A = x) 
P(A =x) 


| fOD- i) 
s f(x) d 


P(X, =i|A=x = 


since in order for the event in the numerator to occur, X, — i and the other n — 1 random 
variables add up to x — i. It follows that 


x-1 es Des; 
EX,A- 92 Y i87 6-9 P » 
i=l 


There is, however, nothing special about X,, and, by symmetry, we get exactly the same 
equality, with X, replaced by X; for j = 2,3,...,n. Add up this last equality for all n values 
of j. The left hand side is just E(A|A = x), which is simply x. The right hand side of each 
equation is a constant that gets multiplied by n in the sum. Equating and rearranging gives the 
stated identity. 


In the remainder of this section, we will let g denote the probability function of S. 


Theorem 21.1 (The recursion formula) Suppose that p satisfies (21.18). Then, for all 
positive integers k, 


k 


stk) = Y (a+ 7) fa - n. 


i=l 
Proof. For any positive integer k, we have 
k k k 
gtk) — 9 pimp") = a 3 pn - Df*"Q) +b M, pn 1 


n=l n=1 n-l 


y (21.20) 
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Consider the term multiplying a in the above, which is 


PONE) + PF) + pQXf G0 + PAPA + + pk- DFO. 
By applying (21.19) to each summand (after the first), we can write this as 


P(OYG)- -pOY(Oyf(-D- pQY(OYf"?(k-19 pBfDP3h-) + 
pOfQy(k- 229 pQYQy?(k-29 POSSI- +- 
pO BALK 39. pf y? (k -32-. pf (3/9 (k - 3) +- 


The sum of the first row, excluding the leading term p(O)f(K), is 


FODIPCDf(k — 1) + pO? (k — 1) + pY 9 (k — 1) + + pk — DEP- DI 
= f(Dg(k - 1). 


Similarly, the sum of the ith row is just f (/)g(k — i). The leading term f (k)p(0) equals f(k)g(0), 
since, given the restriction of strictly positive values for X, the only way for S to be equal to 
0 is if N = 0. The sum of the entire array is then simply 


k 
Y Gee - p. (21.21) 


i=1 


Next, consider the term multiplying b in (21.20). We do exactly what we did above except 
we use the identity in Proposition 21.1 in place of (21.19). In this case, the term from (21.20) 
introduces a coefficient of 1/n in the column of the array involving p(n — 1). Had we used 
(21.19), we could not have conveniently summed along rows. The beauty of the other identity 
is that it introduces a term n/k in the column involving p(n — 1) that conveniently cancels 
with the 1/n. The sum in this case is the same as (21.21), except that it is multiplied by 1/k 
and if (i) replaces f (i), giving 


k 
1 > ifst - 0. 
i-l 


Substituting in (21.20) gives the recursion formula. 


The starting value for the recursion is given by g(0) = p(0), as indicated above. 

Of course, in order that this formula be useful, we need to know that there are distributions 
of N that satisfy the given condition on p. Fortunately, this occurs for our three main families. 

If N ~ Poisson(A), then 


pk) A 


pk-1 k 


so (21.18) holds with a = 0,b= 4. 
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If N ~ Bin(m, p), then 


p(k) _m-k+1 p 


pk-1) k 1-p 


so that (21.18) holds with a = —p/(1 — p), b = (m+ Dp/(1 — p). 
If N ~ Negbin(r, p), then 


pk) — rcek-1 
pk-D k P 


so that (21.18) holds with a = p, b = (r — 1)p. 
What are the other possibilities? It turns out that there are none, and that, remarkably, only 
these three families satisfy the required recurrence relation. 


Theorem 21.2 Suppose that N is a nonnegative-valued random variable satisfying (21.18). 
Then: 


l. ifa  0,N ~ Poisson(b); 
2. ifa>0,N ~ Negbin(b/a+ 1,a); 


3. Ifa<0,N ~ Bin(m, —a/(1 — a)) for some positive integer m. 


Proof. 


(i) If a = 0, then clearly p(k) = (b. /k!)p(0). This gives 
1- 2,00 - PO 2, a = PO), 


so that p(0) = e~?. Substituting this into the expression for p(k), we see that N ~ 
Poisson(b). 


(ii) Suppose a > 0. Note first that p(0) cannot be 0. For, if so, then all p(k) would 
be 0, and we would not have a probability distribution. Let r = (b/a) +1. Then 
r > 0, for, otherwise we have p(1) = (a + D)p(0) < 0. Writing b = (r — 1)a, we cal- 
culate inductively p(1) = rap(0), p(2) = [r(r + D/2]a?p(0), ..., p(k) = [r(r + 1)... 
(r+k—1)/k!]a*p(0), .... This gives 


= < 1)...(rt-k-1 : 
1 = Y pto =p) y, 9: 0t £— a poya - ay, 
k=0 k=0 B 


so that p(0) = (1 — a)’, showing that a < 1. By substituting this into the expression for 
p(k), we see that N ~ Negbin(r, a). 
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(iii) Suppose that a < 0. Since a+(b/k) will become negative for sufficiently large 
k, it must necessarily become zero for some value of k. If not, we would even- 
tually get negative probabilities. Therefore, we must have that b» 0 and a= 
—b/(m+1) for some positive integer m. This means that p(k) 2 0 for k » m. 
We note that a + b/k = —a(m — k + 1)/k, so that p(1) = —amp(0), p(2) = (—a)*[m 
(m — 1)/2!]p(0), ... , p(k) = (Ca) [mnn — 1) ... m — k + D/K!]p(0), .... Then 


12 Y pio = pO) — a)". 
k=0 


Let p = —a/(1 — a). Then, —a = p/(1 — p), so that p(0) = (1 — p)", and by substituting 
this into the expression for p(k), we see that N ~ Bin(m, p). 


Example 21.13 Go back to the coin-dice problem that started this chapter and illustrate 
that you can find these probabilities by recursion. 


Solution. In this case N ~ Bin(2, 0.5), so we have a = —1,b = 3. Moreover, f(i) = 1/6 for 


i= 1,2, ...,6. We will do the first few calculations here to illustrate the procedure. 
O=pO=+,  sD-lipsou-l 


13 


1 7 
TAg” g3) = 6 [g(1) + 28(0)] = —. 


171 
802) = = |580 + 280] = L 


21.11.2 The case with claims of zero amount 


There is a more general recursion formula that allows for the possibility that X can take a 
value of 0. This is 


k ; 
= aD 2 (a+ =) flO - à. (21.22) 


which reduces to that given above when f(0) = 0. This can be derived by a suitable modifica- 
tion of the proof of Theorem 21.1. We note that there will be two extra rows in the first array, 
which we used to compute the sum of the term multiplying a. One row at the beginning will 
have terms of the form f(0)f*’(k), and one at the end will have terms of the form f(K)f" (0). 
The beginning row will just sum to f(0)g(k). The ending row will combine with the leading 
term to sum to f (k)g(0). In this case, g(0) is not equal to p(O). In the second calculation, when 
we compute the summation multiplying 5, we only get this row at the end, in view of the 
extra coefficient of i, which makes entries in the new first row equal to 0. The final conclusion 
is that 


(4) = afg + D (a+ 2) rose - n. 


which leads to (21.22). 
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The disadvantage of using (21.22) is that we have a more complicated calculation for the 
initial value than before: 


&(0) = Y pof. 


k=0 


An easier procedure is to follow the alternate method we mentioned for handling per-claim 
deductibles in the previous section. That is, we simply get rid of the zero claims by counting 
only the positive ones. From Section 21.8, we know now to modify N in all the relevant cases. 
We must also replace X by X*, but that is done simply by multiplying each probability by 


1/ — f). 


Example 21.14 Suppose N ~ Negbin(0.5, 0.4). X takes the values 0,1,2 with probabilities 
0.25, 0.35, 0.40, respectively. Write down a recursion formula for computing g. 


Solution. We follow the second procedure. The probability of a nonzero claim is 0.75. 
Recall that in the negative binomial, the ratio p/(1 — p) gets multiplied by this probability 
and changes from 2/3 to 1/2. So the new value of p is 1/3, and we have a = 1/3, b = —1/6. 
The random variable X* takes the values 1,2 with probabilities 7/15, 8/15, respectively. The 
recursion becomes 


«o-(2) . = 
1 
3 


15 
«o-(i-z)qst-D4(i-L)kse-»— ra 


Notes and references 


The reader is cautioned that some authors use an alternative definition of the negative binomial 
random variable. In the formulation in terms of repeated trials, they would count the total 
number, rather than just the successes. Their random variable would then be equal to N +r, 
where N is the definition that we have adopted. Moreover, parameters chosen for the different 
distributions are not standardized and different choices are made by various authors. We 
indicated an alternate choice for the negative binomial. Another example occurs with the 
exponential distribution. While we chose to parametrize this by the hazard rate, some may 
use the mean and take the parameter to be the reciprocal of ours. This is carried forward to 
the gamma distributions. So for example, what we call a Gamma(a, fj) distribution could be 
termed a Gamma(a, fj -1) distribution by others. 


Exercises 


21.1 If N ~ Binomial(9, 1/3) and X ~ Gamma(2, 0.5), find E(S) and Var(S). 


21.2 IfN takes the values 0, 1, 2, 3 with probabilities 0.3, 0.4, 0.2, and 0.1 respectively, and 
X takes values 1, 2, 3, 4, 5 each with probability 0.2, find the probability that S = 4. 


21.3 


21.5 


21.6 
21.7 


21.9 


21.10 


21.11 


21.12 
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If N takes values 0, 1, 2, 3 with probabilities 0.5, 0.3, 0.1, 0.1 respectively, and X 
takes values 1, 2, 3, 4, 5, 6 with probabilities 0.2, 0.2, 0.2, 0.2, 0.1, 0.1 respectively, 
find the probability that S = 6. 


Suppose that N is negative binomial with mean 4 and variance 12, and X is exponen- 
tially distributed. Let N; denote the number of claims that are less than the average 
claim amount. Find the variance of N,. 


The frequency of accidents for automobile drivers over a certain period follows a 
Poisson distribution. Good drivers can expect to have on average one accident over 
that period, while bad drivers can expect to have two accidents. It is estimated that 
80% of drivers are good and 20% are bad. If an accident occurs, the claim amount 
is exponentially distributed with a mean of 100. Calculate the expected value and 
variance of the aggregate claims over this period. 


Suppose that N ~ Negbin(2, 0.8) and X ~ Gamma(4, 3). Find E(S) and Var(S). 


Suppose that N takes the values 0, 1, 2 with probabilities 0.5, 0.3, 0.2 respectively, 
and X takes values 1, 2, 3, with probabilities 0.3, 0.6, 0.1 respectively. 


(a) Find the probability that S = 3. 
(b) Let N; = the number of claims of size i, fori = 1,2, 3. Are N, and N, independent? 


Suppose that N is a continuous mixture of Poisson(A) distributions where A ~ Gamma 
(3, 2). Find E(N) and Var(N). 


Suppose that N takes the values 0, 1, 2, 3 with probabilities 0.4, 0.3, 0.2, 0.1 respec- 
tively, and that X takes the values 10 with a probability of 0.5, 20 with a probability 
of 0.3, 30 with a probability of 0.1, and various other values, all higher than 30, with 
a total probability of 0.1. (You are not given these values.) 


(a) Find the probability that S = 30. 

(b) Find E[(S — 15), ], given that E(X) = 20. 
Suppose N ~ Poisson(2) and X ~ Exp(3). 
(a) Find E(S) and Var(S). 

(b) Find M,(1), where M; is the m.g.f. of S. 


The number of customers arriving at a restaurant is Poisson distributed with a mean of 
15 per hour. The amount that each customer spends is exponentially distributed with 
an average of 20. The restaurant is open 16 hours each day and the daily expenses 
are 4500. Using a normal approximation, estimate the probability that on a given day, 
revenue will cover expenses. 


Suppose that N is a continuous mixture of Poisson distributions where the mean is 
itself a random variable. Find P(N = 0) in each of the following cases. 


(a) The mean is uniformly distributed on the interval [1, 3]. 


(b) The mean has a Gamma (2, 3) distribution. 
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21.20 
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Suppose we have two mutually exclusive special classes of claims N, and N,. Show, 
by an example, that if N has a binomial distribution, then N, and N, need not be 
independent. 


Suppose that the probability of n claims is 1/2"*! and the probability is e~?* that a 
given claim will be greater than x. What is the probability that the aggregate claims 
will be less than or equal to log (10)? 


A policy will cover 75% of all losses in excess of 240 with the further provision that 
a maximum payment of 2700 will be made regardless of the loss. Express the amount 
paid on a claim with terms of the form X A d, where X is the actual loss. 


Each hour, vehicles pass a certain point on a highway in accordance with a Poisson 
distribution. The expected number of vehicles that pass during the hour is four. Assume 
that one half of all passing vehicles are trucks, and one quarter are sports cars. Find 
the probability that, in a given hour, the passing vehicles include exactly two trucks 
and exactly one sports car. There is no restriction on the number of vehicles other 
than trucks and sports cars (so, for example, the event of two trucks, one sport car and 
seven other vehicles would satisfy the given condition). 


Suppose that N is geometric with mean = 1, and X takes the values 10 or 20 with 
equal probability. Find E(S — 30),. 


For a certain insurer, N has a Poisson distribution, and X is exponentially distributed. 
Each claim is subject to a deductible of d. If d = 2, the expected amount paid by 
the insurer is equal to 100. If d — 3, this expected payout reduces to 50. What is the 
expected payout if d = 1? 


The density function of X is given by 


10-x 


.  O<x< 10. 
50 i 


fœ) = 


In order to reduce the expected amount paid, the insurer is considering two possibil- 
ities. One is to introduce a deductible of two per claim. The other is to pay only a 
maximum of 7 per claim. Which scheme should they adopt if they want to minimize 
the expected payout? 


The manufacturer of a television set costing 1000, offers a guarantee to repair or 
replace the set for free for the first year. The number of defective sets follows a 
Poisson distribution with mean 4. Half the defective sets require replacement and half 
require a repair costing 500. The manufacturer purchases an insurance policy that 
will cover the total cost of this guarantee above 1500. (So, for example, if there were 
four defective sets and each required replacement the insurer would pay 2500 to the 
manufacturer.) Find the expected amount that the insurer will pay. 


For a certain collection of contracts, N ~ Poisson(4), while X ~ Exp(3). Suppose that 
each individual claim is subject to a deductible of 2. If S’ is the total amount actually 
paid on all claims, find the variance of S’. 


For a certain firm, the number of losses of a certain type has a Poisson(2) distribution. 
The amount of a loss takes a value of 100, 200 or 300, with probabilities 0.5, 0.3, 0.2, 
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respectively. The firm purchases an insurance policy that will cover all losses above 
an aggregate deductible of 200. (So, for example, if there was one loss of 100 and 
three losses of 300, the insurer would pay 800.) What is the expected reimbursement 
by the insurer? 


A manager of a certain office is offered a bonus each month if the total expenses are 
under 1000. The bonus is half the difference between 1000 and the expenses. So, for 
example, if expenses were 800, the bonus would be 100. Find the expected value of 
the bonus in each of the following cases. 


(a) Expenses are exponentially distributed with a mean of 2000. 
(b) Expenses are uniformly distributed on the interval [0, 4000]. 


The distribution of N is a mixture of Poisson(A) distributions, where A follows a 
Gamma distribution with a = 10 and f = 2. Moreover, X has a Pareto distribution 
with 0 = 4, a = 3. A per-claim deductible of 2 is applied. Find the expectation and 
variance of the aggregate payments made on all claims. 


You are the manufacturer of a product that gives guarantees against failure. Each 
month there is a 50% chance that there will be exactly one failure, a 30% chance that 
there will be exactly two failures, and a 20% chance that there will be exactly three 
failures. Moreover, 20% of failures will be complete, requiring a full reimbursement 
of 800, while 80% will require only partial reimbursement of 400. Each month you 
purchase insurance that will provide all reimbursements for that period above a total 
of 1000. (So, for example, if there were three complete failures, the insurer would pay 
1400.) What is the expected amount of reimbursement that the insurer will pay each 
month? 


For a certain insurer, the frequency of claims has a negative binomial distribution with 
an expected value of 16 and the claim severity distribution is Pareto (600, 2). The 
insurer is planning to introduce a per-claim deductible of 200. If this is done, what 
would be the reduction in the expected value of aggregate claims? 


E(X ^ d)/ E(X) is known as the loss elimination ratio (LER), since it gives the pro- 
portion of the risk to the insurer that is eliminated by a deductible of d. For each 
of the following distributions, find the LER in terms of d and the parameters. (a) 
X ~ Exp(A), (b) X ~ Pareto(O, a), (c) X ~ Gamma(Q, f). In each case verify that your 
answers have the correct limits as d approaches 0 or co. 


Suppose that the severity distribution changes from X to (1 + r)X due to inflation. 


(a) Show that if the deductible d is increased to (1 + r)d, the LER is unchanged. What 
happens to the LER if d is unchanged? 


(b) Suppose that X is exponentially distributed and for a certain value of d the LER 
is 0.3. If r = 0.10 and d is unchanged, what is the new LER? 


Suppose that N is a continuous mixture of Poisson(A) distributions, where 4 ~ 
Gamma (2, 1) and X takes the value 1 with probability 0.6 and 2 with probability 
0.4. If f5(10) = c and fs(11) = d, find a formula for fs(12) in terms of c and d. 
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For a certain collection of contracts, X takes values 1, 2, 3, each with equal probabil- 
ities. You know that the insurer uses a recursion formula to calculate g(s) = P[S = s] 
and you are trying to determine what distribution is being used for N. All you have to 
go on is a scrap of paper on which the following appears: 


g)- zb) + yg(2) + zg(1]. 


where x, y, and z are numbers that have been smudged and are unreadable. You can, 
however, read enough of the numbers to definitely conclude that y is strictly less than 
2x. What is the distribution of N and why? (Just identify the basic type. You do not 
have enough information to determine the parameters.) 


Compute gs(x) for x = 0, 1, ..., 5 for the following three compound distributions, each 
with claim amount distribution given by fy(1) = 0.7 and fy(2) = 0.3: (a) Poisson with 
A = 4.5; (b) negative binomial with r = 4.5 and p = 0.5; (c) binomial with m = 9 and 
p - 0.5. 


Suppose you have 11 boxes, numbered 1 to 11, and each box contains three balls, of 
which two are numbered 1 and one is numbered 2. A coin has a probability of 0.6 of 
coming up heads. You are going to toss the coin 11 times, and if a head occurs on the 
ith toss, you are going to take box number i and randomly select a ball. Let g denote 
the probability function of the random variable S, the total of all the selected balls. 
Given that g(10) = 0.1386 and g(11) = 0.1055, find g(12). 


Aggregate claims S follow a compound Poisson distribution with A = log(4) and with 
the probability function of X given by fy(k) = 27*/(klog(2), k= 1,2,.... What is 
the distribution of S? 


Suppose that (N;), i = 1,2, 3, is an independent family of random variables where N; ~ 
Poisson(i). If S = 3N, + 2N, + 5N3, find distributions N and X so that S ~ (N, X). 
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Risk assessment 


22.1 Introduction 


The previous chapter was largely devoted to computing or approximating the distribution of 
aggregate claims for the losses on an insurance portfolio. The next problem that arises is to 
effectively use this information to assess and manage the risk associated with the insurer’s 
commitment to pay these losses. Similarly, a consumer is interested in assessing the extent 
to which their risk is transferred by the purchase of insurance. We alluded to this theme 
somewhat in Part II of the book but we now wish to investigate some of the issues in more 
detail. Our concentration will be on a more general basic question that has application in 
many areas. Given two or more uncertain alternatives, how do we compare or measure the 
amount of risk associated with each? This is a large topic and we confine ourselves here to a 
survey of some of the main ideas. It is important in what follows to distinguish between two 
cases. The quantities in question may involve losses in which case less is better, or gains, in 
which case more is better. We could conventionally fix one or the other, by introducing minus 
signs, but that complicates the notation, so we rely on the context to clarify what is intended. 
We start in the next section by talking about gains. 


22.2 Utility theory 


One method which may seem natural for deciding between two random payouts is to compare 
the expected amounts that you will receive. However, this does not always give reasonable 
answers and does not always conform to choices that rational people actually make, as the fol- 
lowing example indicates. Imagine that you are offered the following two risky alternatives. 
In alternative 1, you gain 200 with probability 0.99, or lose 10000 with probability 0.01. In 
alternative 2, you gain 200 with probability 0.5 or lose 10 with probability 0.5. Alternative 1 
has an expectation of 98, which is greater than 95, the expectation of alternative 2. However, 
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nearly everybody would reject alternative 1 with the possibility, although small, of a very 
large loss. Expectation does not take into account the amount of the risk involved. 

Perhaps the most famous example of the drawbacks inherent in using the expected value as 
a decision tool is the St. Petersburg Paradox, formulated by Daniel Bernouli in the eighteenth 
century. His explanation marked the beginning of the concept of utility theory. A game 
consists of tossing a coin until a head appears, with a payout of 2” if this occurs on the 
nth toss. The expectation of the amount to be won is then 2.3 2"(1/2") = oo, but it is not 
reasonable to expect that someone would pay an arbitrarily large amount to play this game. 
Bernouli introduced the idea that one should not consider the actual amounts paid but rather 
the ‘satisfaction’ or ‘utility’ that comes with possessing a certain level of wealth. In other 
words, he postulated that each individual has a so-called utility function u, where u(x) denotes 
the utility that the person derives from having x units of wealth. It is expected that u is an 
increasing function of x, as more wealth gives additional utility, but that the rate of increase 
diminishes with increasing x. A person who is already a multimillionaire will derive little 
satisfaction from an additional 1 unit of wealth, while somebody who is destitute would 
welcome it greatly. In mathematical terms, it is expected that for most people u is a concave 
function. (We give a precise definition in Section 22.3.) If u is differentiable, the above 
features simply mean that the first derivative of u is nonnegative and the second derivative is 
nonpositive. A typical example of such a function is log x which was chosen by Bernouli in 
his explanation of the St. Petersburg Paradox. He argued that one should consider the expected 
utility rather than the expected value of the actual amounts, and indeed 2: log(2”)(1/2") 
is finite. 

People with concave utility functions are called risk-averse, since they prefer certainty to 
uncertainty, and therefore derive utility from insuring, as the following example illustrates. 


Example 22.1 People are faced with a potential loss of 100, which will occur with 
probability 0.1. Their goal is to maximize the expected utility of their resulting wealth. How 
large a single premium P would they pay to insure against such a loss if their utility function 
is given by u(x) = log(x), and their initial wealth is (a) 1000? (b) 500? 


Solution. In part (a), if they do insure, they will have utility of u(1000 — P) = log(1000 — P), 
which is certain; while if they do not insure, they will have an expected utility of 0.94(1000) + 
0.14(900) = log[(1000)9?(900)9-! ]. Equating the resulting expected utility, the largest P they 
would pay is given by 


1000 — P = 10009? 900°! . 


which is solved to give P — 10.48. So such individuals would be willing to pay more than the 
expected loss of 10 in order to acquire the utility they derive from the extra security. 

In part (b), the equation changes to 500 — P = 50092400?! which is solved to give 
P — 11.03. The premium is higher, reflecting the fact that people with the lower wealth are 
less prepared to suffer a loss and will pay more for insurance. This indicates the important fact 
that in general one must take into account initial wealth and not just the particular transaction 
when comparing alternatives as to expected utility. (See Exercise 22.1 for an exception to this 
statement.) 
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Economists use many other choices of utility functions in their desire to model people's 
preferences. One of the most popular choices is the family of so-called power utility functions. 
This is a parametrized family, which includes the log function, and is defined by 


xl? —1 
Dy 


, 


Uy (x) = 


for some parameter y > 0. It is clear that the first two derivatives have the property stated 
above. 

Readers should not be dismayed by the fact that u,(x) can take negative values, for as 
long as we use these for comparative purposes there is no difficulty. Indeed, if we replace any 
utility function u by the function au + b, where a and b are constants with a > 0, it follows 
from the linearity of expectation that we will always obtain the same results when comparing 
two alternatives as to expected utility. 

From this point of view we could have described the above family by the functions x!~” 
for y < 1 or —x!~” for y > 1. However, an advantage of the given form is that we can define 
the value at y = 1 by taking a limit. Applying L’ Hopitals rule, we get 


uj, X) = lim u,(x) = log x. 
y2l 


As y increases, individuals become more risk-averse, as indicated by the fact that they 
will pay more to reduce risk. For example if we redo part (a) of Example 22.1 with y = 2, we 
get the equation 


—(1000 — P)-! = —0.9(1000)-! — 0.1(900)!, 


which is solved to give P = 10.99, an amount greater than the premium of 10.48 for y = 1. 
Note that yp(x) = x — 1 which indicates there is no risk aversion and comparison is done 
simply by expected values. A person with such a utility function is the risk-neutral individual 
described in Section 20.5. 
People with a convex (defined in Section 22.3) utility function would be termed risk- 
seekers as such people will pay to gamble, even at unfavourable odds, (typical of the usual 
casino). The following example illustrates the effect of such a utility function. 


Example 22.2 People with an initial wealth of 10 are offered a chance to play a game in 
which they win either 2 or 0, each with probability 1/2. If their utility function is given by the 
convex function u(x) = x?, what is the most they will pay to play this game. 


Solution. Let P be the amount paid to play the game. We equate the expected utility of not 
playing versus playing which gives the equation 


100 = 0.5[(12 — P? + (10 — P^]. 


Solving, P — 1.0501, which is more than the expected winnings of 1. 
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The examples of this section should provide further clarification of the difference between 
insurance and gambling that we alluded to in Section 1.1. In the next section, we provide 
some more precise definitions and mathematical verification of our conclusions. 


22.3 Convex and concave functions: Jensen’s inequality 


22.3.1 Basic definitions 


Definition 22.1 A real-valued function g defined on an interval Z of the real line is said to be 
convex if for all x and y in Z and O <a < 1: 


g(ax + (1 — a)y) < ag(x) + (1 — @)g(y). (22.1) 


Geometrically, this says that a line segment joining any two points of the graph of g will 
lie above the graph. 

A feature of convex functions that is often used is the increasing slope condition which 
states that for three points x < y < z in Z, 


£0) — 8x) 2 gz) — 80) 


< (22.2) 
y-x z£—y 
To derive (22.2) note that 
y- Suh qeu ut Ty 
z—Xx z—x 


so that 
z— =k 
20) < 2. g + E go. 
LX. La 


Now multiply this equation by (z —3)/(z — Ky — x) = (y —x)7! + (z—y)7! to get 


^ 


«| AME |< 8&0) , s) 
y-x z-y y-x z-y 
and rearrange to get (22.2). 

Many readers will be familiar with the result from basic calculus that for twice differen- 
tiable functions, convexity is characterized by the fact the second derivative is nonnegative. 
The advantage of the general definition above is that we can apply it to the functions that 
have points of nondifferentiability, such as the two families of functions (using the notation 
of Section 21.10.1): 


ugx) = x- d) vœ =d-x^d, 


whose graphs are shown in Figure 22.1. Any piecewise linear function (one whose graph 
consists of a finite number of straight line segments) can be written as a linear combination of 
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Figure 22.1 Graphs of u,(x) and v4(x) 
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the u^ s and vs. In view of the increasing slope condition, a convex piecewise linear function 
can be written as a linear combination of these with positive coefficients. For example, the 
function defined on the real line by 


|j Il, if x < 2, 
fay= 4 Re if2<x 
can be written as vo + ug + 2u. 


Definition 22.2 A function defined on an interval / of the real line is said to be concave if 
the inequality reverses in (22.1) and therefore also in (22.2). 


For concave functions, straight line segments between any two points of the graph are 
now above the graph. For twice differentiable functions the second derivative is nonpositive. 
Clearly, a function g is concave if and only if —g is convex. 


22.3.2 Jensen’s inequality 


We introduce a basic inequality for risk assessment. To motivate, imagine that you are pre- 
sented with a bag with two numbered balls. You draw one at random and receive the square of 
the number. If both the balls had number 5, you get 25 for sure. What if they were numbered 
4 and 6, which average 5? You now get an expected payoff of 0.5(36 + 16) = 26 > 25, so the 
randomness has produced an extra expected return over the certain case. We can see exactly 
why this happens by writing 


(5+ 1° =5*+2(5)+ 1° 
and 
6-1? 25 -2(5)4 -1y. 
The key to this result then is the fact that (-1)? = 1. When you average, the middle 
terms cancel but the squared term is the same for the positive and negative deviations, giving 


the extra amount. This calculation shows that you will get the same conclusion for any two 
numbered balls. What about for three balls? Say you have numbers 1, 2, 6 which average 3. 
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The average of the squares is 13 2/3 which is greater than 3?. Indeed, the principle involved 
holds in great generality since the inequality 


0 < EX - py’ = EQ) - p’, 


where u = E(X), shows that for any random variable X, the expectation of X? must be greater 
than or equal to the square of the expectation of X. 

What happens for other functions other than the square? For the square root function, the 
inequality goes the other way. If you have two balls numbered 16 and 4 with an average of 
10, the average of the square roots is 3 which is less than v10. It turns out that we get the 
extra return with any convex function, and therefore a reduced return from the average with 
any concave function. The formal statement is as follows. 


Theorem 22.1 (Jensen’s inequality) For any convex function g, 
E[g(X)] = g(E(X)). 


Proof. We will give the proof in the case where g has a continuous second derivative. The idea 
is that by our remarks above, it is true for any polynomial of degree two which has a positive 
coefficient of x7. Taylor's theorem from basic calculus tells us that g can be approximated by 
such a polynomial. Precisely, if E(X) = yw, then 


(x= u? 
B(x) = 8) + (x = W)s'(H) + — 8" E, 

for some point é between p and x. Of course we do not know € but we do not need it, since if 
g is convex, then g" is nonnegative, and the left-hand side is greater than or equal to the sum 


of the first two terms. Taking expectations, 


E[g(X)] = g(u) + g'G)E(X — u) = gQo. 


completing the proof. 


Jensen's inequality obviously reverses for a concave function, as seen by applying the 
statement above to the function —g. This shows that in general we have the conclusions 
shown by the particular examples of the last section. That is, as we would expect, risk-averse 
individuals prefer certainty to risk. Such individuals with initial wealth w who pay E(X) 
to insure against a loss X will have resulting utility of u(w — E(X)) which is greater than 
E[u(w — X)]. Therefore, they are willing to pay somewhat more than the expected value of the 
benefits. This makes the business of insurance economically feasible, since as we indicated 
in previous chapters, the insurer must necessarily charge more than the expected value of the 
loss in the form of loadings for expenses, profits and risk. 


22.4 A general comparison method 


In this section, we consider the general problem of comparing two random variables as to their 
degree of risk. There are many ways of defining such an order relationship. We will concentrate 
on one of the early definitions, introduced in Rothschild and Stiglitz (1970) which ties in with 
the utility theory concept. For simplification, we concentrate on the equal mean case. 
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Definition 22.3 For nonnegative random variables X and Y with E(X) = E(Y), we say that 
X is less risky than Y, if for all d, 


EX — d), SEY — d),. (22.3) 


For example, if our random variables represent losses, then the net premium for deductible 
insurance that covers losses above a certain amount is always less for the less risky option. 
It is clear that this relation does not depend on the actual random variables but only on their 
distribution. 

In view ofthe equal mean hypothesis, formula (21.11) shows that an equivalent formulation 
is that for all d > 0, 


E(X ^ d) = E(Y ^ d). (22.4) 


We can apply the above inequalities to the functions introduced in the previous section. 
The result is that if X is less risky than Y, then for all d > 0, 


Elug(X)] S E[u(Y), Elva(X)] < Elva], (22.5) 
and therefore, for any piecewise linear convex function g, 
E(g(X)) S E(g(Y)), (22.6) 


since the positive coefficients, when we express g as a linear combination of the ug and vg 
for various values of d, will preserve the order. Now given an arbitrary convex function, we 
can approximate it by a piecewise linear one as follows. Choose a large number of points on 
the graph and join them with straight line segments. The increasing slope condition shows 
that this approximating function will be convex. The more points that we choose, the better 
the approximation will be. By standard approximation techniques in analysis it follows that 
(22.6) holds for all convex functions (we do not give the exact details here), and of course 
the reverse inequality holds for concave functions. We are therefore led to the conclusion that 
if risk-averse individuals have to choose between two distributions of their final wealth, both 
of the same mean, they will choose the less risky one according to our definition, in order 
to maximize expected utility. This justifies the definition given by Rothschild-Stiglitz as one 
which truly captures the concept of riskiness. 

It is not always easy to decide whether one random variable is less risky than another, 
but there are certain instances when we can verify this. It is assumed in the following two 
theorems that E(X) — E(Y). 


Theorem 22.2 Suppose that X takes the values a, < a; € ... < ay, € ay each with prob- 
ability 1/N and Y takes the values b, € by... < by.., € by each with probability 1/N. Then 
X will be less risky than Y if and only if 


k k 
Yaz Yo; (22.7) 


forl<k<QN. 
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Proof. Suppose the condition holds. Then, given d where a, < d < a,,,, we have 


k 


k 
EQ Ad) = x PELLE >= PX > E(Y ^ d). 
k=1 


k=1 


The last inequality follows from the fact that b; A d is less than or equal to both b; and d for 
all i. 


Conversely, suppose X is less risky than Y. Fix any index k < N, and let d = max{a,, by}. 
Invoking (22.5), we see that, if d = b,, then 


k 
+ Şa- a) < E4001 < Elv(Y)] = de bi), 
i=1 


while if d = a,, then 


N 
x Y -A = Elu00] < E < x Dy (b; — d), 
i=k+1 Nd k+1 


In either case, condition (22.7) follows in view of the fact that the equal mean hypothesis 
3 N N 
implies that $7. , a; = Y. , bi- 


Remark Since repetition of values is allowed, the above theorem can be applied to all finite 
discrete distributions where the probabilities are rational numbers. 


Here is another condition which applies to all distributions. We can view this geometrically 
as saying that if two distribution functions intersect at one point, then the steeper curve gives 
the the less risky distribution (see Figure 22.2). 

Theorem 22.3 (The cut condition) Suppose that for some point c, 
Fy(t) € Fy(t), for t < c while Fy(t) > Fy(t), fort > c. 


Then X is less risky than Y. 


Proof. We use formulas (21.10) and (21.11), and note that the stated inequalities in the 
premise reverse with s in place of F. If d « c, 


d d 
E(X^d)- T Sy(x)dx > A Sy(x)dx = E(Y ^ d), 
0 0 
while if d > c, 
E(X - d), = I sy(x)dx < s sy(x)dx = E(Y — d),. 
d d 


By continuity of the function g(d) = E(X — d),, we must obtain the same result at d = c. In 
all cases we have by definition that X is less risky than Y. 
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c 


Figure 22.2 The cut condition: X is less risky than Y 


Remark The definition of being less risky and the cut condition apply equally well to 
random variables that take negative values. The complication in the proof is that formula 
(21.12) no longer holds as given and a suitable modification is required. We leave this to the 
interested reader. 


Readers should be aware that our ordering in this case is what mathematicians call a 
partial order, meaning that certain pairs are incomparable. Given a choice of X and Y, it may 
happen that some risk -averse people would prefer a final wealth of X and others would prefer 
Y, so neither is less risky than the other according to our definition. Here is a typical example. 

The random variable X, representing a gain, takes the value 1 with probability 0.1 and 
10 with probability 0.9, while an alternative Y takes the value 2 with probability 0.9 and 73 
with probability 0.1. Both have a mean of 9.1. One might expect a risk averter to choose X, 
ensuring themselves of a return of 10 in most cases, rather than gamble on the higher return 
which most of the time will lead to a return of only 2. However, by Theorem 22.2, since 
1 « 2, while 11 > 4, they are incomparable. This happens since there could be someone so 
risk averse that they could not tolerate even the small chance of a return of only 1. Indeed, 
consider the following scenario. Suppose individuals absolutely need 2 units of wealth or 
something dreadful will happen to them. They might well want to ensure that this terrible fate 
cannot occur by choosing Y. 

Of course, when we are able to make a comparison, the conclusion is that much stronger. 
As a typical example we will prove a well-known result on optimal choices of insurance, 
which was proved in Arrow (1963). 

The idea is as follows. Suppose that individuals want to ensure against a loss by paying 
some fixed premium, which is however insufficient to provide for full coverage, and so they 
must arrange for partial reimbursement. 

Let X denote a random loss (assumed to be nonnegative) and P denote the premium they 
want to pay. They must choose some /(X), a function of X, which will be the amount paid 
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to them when loss occurs. We make the natural assumption that 0 < /(X) < X, so that the 
amount paid cannot be negative and cannot exceed the amount of the loss. We assume that 
to provide the coverage, the insurer charges a premium that is a function a of E[/(X)]. Any 
of the modifications discussed in Section 21.10.1, as well as others, could be used. As an 
example, suppose the insurer charges a 2096 loading above the expected value of the loss and 
E(X) = 100, while P = 60. The insured then is only going to pay one-half of the premium 
necessary for full coverage. One way of doing so is to choose /(X) — 0.5X so whatever the 
loss is, the insured will receive one-half of this amount as reimbursement. Another alternative 
is to set a maximum on the amount paid, and of course there are many other possibilities. 
Which choice is best? 


Theorem 22.4 Given the assumptions above, any risk-averse individual should choose 
deductible insurance, in order to maximize the utility of resulting wealth. 


Proof. Suppose the individuals start with an initial wealth of a. Their resulting wealth after 
paying the premium, incurring the loss and being reimbursed, will be 


W-2a-P-X4I(X). 


In particular let /o9 (X) = (X — d),, where d is chosen to satisfy z[EU(X))] = P, and let Wọ 
be the resulting wealth for this case. Apply the cut condition with c = a — P — d. Since Wọ is 
never less than c, then F wy) = 0 < F\y(0), for any t < c. Consider now the case when t > c. 
If Wọ > t, then X — Ip(X) < d, so the loss must have been under the deductible. From our 
assumption that reimbursement cannot exceed the loss, we must have /(X) < d, showing that 
W > Wọ > t. It follows that Fyw(t) = F Wo (t) completing the proof. 


The conclusion seems intuitively clear since deductible insurance avoids large catastrophic 
losses, which should appeal to the risk averter. 


Remark We leave to the interested reader the following extension of the above problem. 
The individual truly interested in maximizing utility would normally not specify the desired 
premium in advance, but would let that vary as well. The problem now is not only to choose 
the form of insurance but also to choose P, which will be a function of the initial wealth. 


22.5 Risk measures for capital adequacy 


22.5.1 The general notion of a risk measure 


In some cases we want to do more than just compare the riskiness of two random variables. 
We actually want to assign a number that in some sense quantifies or measures the risk, and 
of course that will then allow us to compare two or in fact any number of random variables. A 
function that assigns a number to a certain class of random variables is known appropriately 
enough as a risk measure. We already encountered this concept in Section 15.6.4. Assigning 
a premium to a random variable representing the benefits paid on an insurance contract is a 
form of risk measure. 
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Another important use of this concept is to arrive at the amount of capital that should 
be held to cover possible losses. Now we have already introduced the concept of a reserve 
to provide for future obligations. Recall, however, from Section 15.5 that the reserve is the 
expected amount that one needs, and it is provided for by the excess premiums that are 
collected in early years. The reserve is not intended to provide for unexpected large losses. 
If these occur, the company will have to draw on surplus in order to maintain the required 
reserves, and the question is, how much capital should be on hand for this purpose. This idea is 
not just confined to insurance. Measures for this purpose are extensively used by the banking 
industry. The method that has been almost universally adopted by banks is a quantile-based 
approach, which we have already discussed in Section 15.7. For present purposes, we now 
introduce some more precise definitions and terminologies. 


22.5.2 Value-at-risk 


Suppose we have a random variable X and a number a between 0 and 1. A number x, such that 
Fy(x) = a, is known as an a-quantile of X. (An alternate terminology is to multiply a by 100 
and speak of a percentile. For example, a 0.7 quantile can be referred to as 70th percentile.) 
The 0.5 quantile is commonly referred to as the median of the distribution. 

When the values of Fy vary continuously from 0 to 1 over some interval, then FX (a) is 
the unique a-quantile. We will denote this number by qg. In other cases there might not exist 
any such number, or there might exist infinitely many. The former arises when the value of Fy 
jumps at a point x from a number less than a to one more than a. In that case we take q, to be 
the point of the jump. Such a point will of course be equal to q; for several different values of 
p. The case of infinitely many values arises when for some x, Fy(x) = a, but X does not take 
any values in some open interval with a left endpoint of x. For example, if Fy(3) = 0.7 and 
the probability of X taking a value in the interval (3,3.1) is zero, then Fy(x) is also equal to 
0.7 for any x in the interval (3, 3.1). For our purposes we will want to single out the smallest 
such number. We can then cover all cases by the following. 


Definition 22.4 
dq = min(x : Fy(x) > a}. 


When X represents losses, or an amount that to be paid out, g, can be viewed as a type 
of risk measure, with higher values signifying more risk. The percentile premium was a risk 
measure of this type. In the banking industry, this risk measure has been termed value-at-risk 
and abbreviated as VaR. (The capital R at the end distinguishes it from the common notation 
for ‘variance’. It is pronounced to rhyme with ‘far’.) VaR is expressed with a certain time 
horizon and a confidence level a, often taken to be a number reasonably close to 1 such as 0.95 
or 0.99. For example, to say that a certain investment portfolio has a 1 day VaR of 100000 
means that 100 000 will be sufficient to ensure that the losses over the next day will be covered 
most of the time where ‘most’ is measured by the specified confidence level. 


22.5.3 Tail value-at-risk 


There are problems with VaR, as we have already noted in connection with percentile premi- 
ums. It does not take into account how bad the losses can be when they exceed the chosen 
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quantile. In the above example, if there were a possibility of a loss of several million, which 
occurred with probability less than (1 — @), we still would have a VaR of only 100000. The 
possibility of this large loss would be completely ignored in our risk measure. For this, and 
other reasons, many people have advocated a modification of VaR known as tail value-at-risk 
(abbreviated as TailVar or just TVar). The same measure is also known as conditional tail 
expectation, abbreviated as CTE or sometimes TCE. 

For a given confidence level a, TVaR, is essentially defined as the expected loss given 
that the loss is in excess of the quantile g,. Problems can arise in interpreting the words ‘in 
excess of’. Do these words mean ‘strictly greater than’ or ‘greater than or equal to’? The 
distinction is irrelevant for continuous distributions, but in the discrete case they can give 
different results, and, strangely enough, neither may be the one that you want. For example, 
suppose that somebody tosses a penny and a dime and you must pay them 1 for each head. Set 
a = 0.5. The median loss is 1, and TVaR, s should be the expected value in the worst one-half 
of the distribution. There are four possibilities of equal probability giving respective losses of 
0, 1, 1, 2, so we want to select the two worst cases. Of course there is a tie, but a logical way to 
handle this is to arbitrarily pick any one of the two coins, say the penny, and we then say that 
the the two worst outcomes will be when both coins come up heads, or the penny only comes 
up heads. With this reasoning TVaRg 5 should be 0.5( 1+2) = 1.5. However, the expected loss, 
given that the loss is strictly greater than 1, will be 2, and the expected loss, given that the 
loss is greater than or equal to 1, will be 4/3. Readers are cautioned that there are examples 
in the literature where the ‘strictly greater than’ or ‘greater than equal to’ methods are used in 
defining TVaR and similar risk measures. However, both of these can lead to inconsistencies. 
See, for example, Exercise 22.6. These are avoided by the definition below, which follows 
from the idea presented in this simple example. 

For a continuous distribution we define our desired risk measure by 


1 foo} 
IEF [ x fy (x)dx. (22.8) 
Now by definition 
/ fy(a)dx = P(X > qa) = (1 — a), 


qa 


so we can add and subtract q, on the right-hand side to get the following. 


Definition 22.5 


TVaR,(X) = qa + ous i (x — quf @)dx = qa + — g — qo), (22.9) 
l-a "m l-a 


The above formula is the best form of the definition to use as it applies to any distribution, 
not just a continuous one. 

As acheck, we verify that it gives the results we expect in the discrete case. (In particular 
we retrieve the answer of 1.5 obtained in the coin-flip example.) We calculate TVaR for 
the particular type of discrete random variables considered in Theorem 22.2. For a definite 
example, let N = 10 and suppose that our confidence level a = 7/10. Then VaRg 7(X) clearly 
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equals a; which will cover the loss whenever the outcome is aj, i < 7 which occurs 7/10 of 
the time. What about TVaRg 7? 
From Definition 22.5, this is just 


az  (10/3)(1/10)[(ag — a7) + (ag — a5) + (a49 — 47)] = (1/3)[ag + ao + ayo], (22.10) 


as we would expect. In general, if a = r/10 for an integer r, then VaR,(X) = r/10 and 
TVarR,(O = (4,4 + a49)/(10 — r). 

Things are a bit more complicated when for some integer r, (r— 1)/10«a < r/10. 
Suppose in the above example that a — 13/20 which is between 6/10 and 7/10. We still get 
VaR (X) = a, but now (1 — a)! = 20/7, so Equation (22.9) yields 


a, + 2ag + 2ag + 2a 4, + ag + ao +4 ag + ao +a 
TVaR 13/20(X) = 7 8 9 = $/ 7 t dg t dg 2) :( 8 t do 3). 


7 7 4 7 3 


The general formula is as follows. If a = (r — p)/N where r is an integer and 0 < p < 1, 
then 


TVaR, = f TVaRg-n/y + (1 — f) TVaR, y, (22.11) 


where f = p(N — r + 1)/(N — r + p). We leave this for the reader to verify. To actually com- 
pute TVaR in this discrete case, it is usually more efficient to use Definition 22.5 directly, but 
(22.11) is useful for demonstrating properties of TVaR, as we will illustrate later. 

Many people believe that a risk measure H used for premiums or capital adequacy should 
be subadditive. Namely they want that 


H(X + Y) < H(X) + HY). 


After all, if we have to set aside H(X) of capital to provide for a risk X and a further H(Y) 
of capital to provide for a risk Y, then it is reasonable to suppose that we will not need more 
than this sum if we take on both risks. (This viewpoint is not completely universal and some 
argue that such activities as mergers can produce inefficiencies and cause other problems 
that actually increase the total risk.) We already saw in Example 15.6 that a quantile risk 
measure does not satisfy subadditivity, and this has been a major criticism levelled against the 
use of VaR. On the other hand, TVaR is subadditive. A complete general proof is somewhat 
advanced and we will confine attention here to show this for the discrete random variables 
that we introduced above, where the proof is straightforward and clearly indicates the reason 
for the result. 

Suppose we have a sample space (c, €», ... ,@y } each with probability 1/N and X takes 
the value a; on œ;. where a; < a; for i € j. Suppose that the random variable Y takes the value 
b; on oj. 

Assume first that b; € b; for i < j. The random variables X and Y in the this case are said 
to be comonotonic. Precisely this means that for two sample points, if X takes a higher value 
on one of them, then Y will also take a higher value on that point. In other words they move 
together. (This concept can been generalized to arbitrary distributions and plays a major role 
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in risk assessment.) Take a = r/N. In this case, the N — r highest values of X + Y will be of 
the form a,41 + b,,,,a,,5 + D495 +... ay + by and from (22.10) it is clear that 


TVaR(X + Y) = TVaR(X) + TVaR(Y). (22.12) 


This is a reasonable conclusion. In the case of comonotonicity, we cannot hope to have a 
high value in one random variable offset by a low value in the other. There is no diversification 
effect and combining the random variables does not lead to a reduction in the total risk measure. 
Now suppose that we remove the comonotonicity by rearranging the b’s. Obviously, the N — r 
highest values of a; + b; cannot get any larger than what we had in the previous case, where 
we included the N — r highest values of both the a’s and b’s. The left side of (22.12) must stay 
the same or decrease, and this implies the required subadditivity when a is as given. Now, we 
can invoke (22.11) to see that it holds for any a. 

Besides subadditivity, there are other desirable features of risk measures that are satisfied 
by TVaR but not VaR. See, for example, Exercise 18.13. 

Following are two examples which compute TVaR for familiar distributions. 


Example 22.3 Compute TVaR for a normal distribution. 


Solution. In the case of a standard normal Z, the density function satisfies xf7(x) = fi (x). 
From (22.8) and the fundamental theorem of calculus, 


TVaR,(Z) = 


1 ES 
= fz(o Ha), 


where © is the c.d.f. of Z. 


Itis not difficult to show that for any X and constants a and b, with b > 0, TVaR, (a + bX) = 
a+bTVaR,(X). Therefore, if X is a normal distribution with mean yw and variance o°, we 
have 


TVaR,(X) = u + T z(o). 
Example 22.4 Compute TVaR for a exponential distribution. 


Solution. First we note that if X ~ Exp(A) then g, = s (l — a) = — log(1 — a)/4. From 
Equation (21.15), Example 21.7, and the fact that sy(q,) = 1 — a by definition, it follows that 
E(x — q4), = (1 — a)/4. Now from Definition 22.5, 


1 — log(1 — 
TVaR, (X) = LRL, 


We conclude this section by providing an equivalent formulation of TVaR for continuous 
distributions, which serves to illustrate further how the information in the tail which is ignored 
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in VaR gets incorporated into TVaR. In the integral (22.8), make the substitution f = Fy(x). 
We have x = qg(X). df = fy(x)dx so that 


1 
TVaR ,(X) = — / qp 00dp. 


This says that we can view TVaR, as an average of all the VaRs at confidence levels 
greater than a. 
22.5.4 Distortion risk measures 


Here is another equivalent formulation of TVaR. For any a in the interval (0, 1), let g, denote 
the function on [0, 1] defined by 


mo o if0<x<l-a, 
gax) = l-a 
1, ifl-a<x<l. 


Then from (21.10) and Definition (22.5), 


TVaR ,(X) = Ta ga x G))dx. 
0 


A whole family of other risk measures arises if, in the above formula, we replace gy 
by any continuous function g that increases from 0 to 1. These are known as distortion risk 
measures. Note that when g(x) = x we just get E(X) as the risk measure. The concept has 
been termed by some as a sort of dual approach to that of taking expected utility. In the latter 
case we distort the amounts paid by converting them into utilities. In this case we distort the 
probabilities. Smaller values of the survival function, corresponding to right tail events, are 
increased in value, in an attempt to reflect the risk. It can be shown that any concave function 
g will give a subadditive risk measure. 


Notes and references 


The order relation we introduced in Section 22.4 is sometimes referred to as the convex order 
in view of (22.6). The same definition with the equal mean hypothesis eliminated is known 
as the stop loss order. 

More detailed information on ordering risks can be found in Kass et al. (2008). This same 
reference contains additional material on utility theory. 

Readers particularly interested is risk measures may consult Artzner et al. (1999), which 
deals with ‘coherency’, a much discussed topic in recent years. 


Exercises 
22.1] Foranya > Odefine a utility function by u,(x) = —e **. (This is known as exponential 
utility.) 


(a) Show that a person with this utility function is risk averse. 
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22.6 


22.7 
22.8 
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(b) Show that when comparing risky alternatives to maximize utility by using ug, the 
result is independent of initial wealth. 


(c) In Example (22.1) let P, denote the premium that would be paid when using ug 
as a utility function. Calculate Po 95, Poo, and limpo Pa- What happens to risk 
adversity as a decreases? 


Use Jensen’s inequality to prove the well-known arithmetic-geometric mean inequal- 
ity. For positive numbers a; 1 <i € n, 


aj +a +t... +a, 


n 


(aya aor a, )!/^ < 


Hint: Take an appropriate distribution and let g(x) = log x. 


Consider the two random variables X ~ exp(2) and Y ~ exp(3) + 1/6. Compare as 
to riskiness according to the definition given in Section 22.4. That is, is X less risky 
than Y, or is Y less risky than X or are they incomparable? 


Compare the following three distributions as to riskiness: 


X takes the value 1 with probability 1/6, 2 with probability 1/3, 3 with probability 
1/3 and 6 with probability 1/6. 

Y takes the value 2 with probability 1/2, 3 with probability 1/3 and 5 with probability 
1/6. 

Z takes the value 1 with probability 1/6, 2 with probability 1/6, 3 with probability 
1/3, and 4 with probability 1/3. 


(a) Show that X less risky than Y implies that the variance of X is less than the variance 
of Y. 


(b) Show that the converse is true in the normal case. That is, for normal random 
variables X and Y with the same mean, the one with the smaller variance will be 
less risky. 


(a) On a sample space of three points, @;, €», c3, a random variable X takes the 
values (1,1,3) and a random variable Y takes the values (1,2,3). Consider the risk 
measure H(Z) = E[Z|Z > qo]. Show that despite the fact that Y takes values at all 
sample points that are greater than or equal to those of X, we have H(X) » H(Y). 


(b) Find an example that works as the above only now with H(Z) = E[Z|Z > q,] for 
some a. 


Suppose X ~ Gamma(2, 3). For a certain value of a, VaR(X) = 4. Find TVaR(X). 


The random variable X has a density function given by 


X, ifüxxc«l, 
fo - {> ifl<x<2. 


Find VaR, (X) and TVAR, (X) for a = (a) 1/2, (b) 31/32. 


22.9 
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22.12 


22.13 
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Find VaR, (X) and TVaR,(X) when X is uniform on [0,N]. 


A random variable X takes the values 1 with probability 4/7, 3 with probability 2/7 
and 6 with probability 1/7. Find VaRp g(X) and TVaRp (X). 


A random variable takes on the values x,,x5,...,X49, each with probability 1/40. If 
TVaRo 95(X) = 100 and TVaR 0.875(X) = 150, find TVaRo 96(X). 


On a sample space consisting of four points that have equal probability, the random 
variables X and Y take, respectively, the values (1,2,3,4) and ( 4,1,2,3). Consider the 
distortion risk measure H given by the function g(x) = x!/?. 


(a) Calculate H(X), H(Y), H(X + Y) and verify that the subadditivity holds. 


(b) Suppose that Z is an other random variable on this space taking the values 
(a, b, c, d), where a € b € c < d. Verify that H(X + Z) = H(X) + H(Z). 


Suppose that X and Y both take N values each with probability 1/N, E(X) = E(Y) and 
that X is less risky than Y. Show that TVaR,(X) < TVaR, (Y), but it is not necessarily 
true that VaR,(X) < VaR,(Y). 


(a) If X is a distribution such that TVaR ,(X) — VaR ,(X) is independent of a, what 
must this constant difference be? 


(b) Show that an exponential distribution has this property. 


(This question refers back to material introduced in Part I of the book.) Assume that 
interest rates are positive, that is, the investment discount function v(t) is decreasing 
with t. 


(a) Show that a(1,; v) is a concave function of t. 
(b) Use Jensen's inequality to show that a(1 e2) > a, 


(c) Now assume constant interest. A l-unit whole life policy on (x) has premiums 
payable continuously at the annual rate of P. Show that 


P E[ValyyA7Qy3 ¥)] = 1. 


That is, the expected amount of premiums accumulated at death by a policyholder 
is greater than or equal to the benefit payment at that time. 


(d) Explain the inequality in (c) by general reasoning. (You may want to refer to 
Section 8.4.3.) 


(e) Show that the inequalities in (b) and (c) are equalities at O interest. 


23 


Ruin models 


23.1 Introduction 


This chapter involves extending some aspects of the collective risk model to a multi-period 
setting. It will require a sound knowledge of the material in Chapter 18. We begin with the 
discrete-time case and consider another interpretation of Equation 18.3. 

Consider an insurer who each period collects total premiums of c and experiences aggre- 
gate claims of (N, X) as defined at the end of Section 21.1. Then the gain of the insurer in the 
nth period is given by a random variable G, where 


G, ~c—{N,X). 


If we assume that claims each period are independent of those in other periods, we can interpret 
Equation (18.3) as representing a surplus process of the insurer, where U, is the surplus at 
time n resulting from an initial surplus of u at time 0. This will be a major application for the 
theory in this chapter, although it applies as well to the original gambling formulation. 

Let T be the first time the surplus becomes negative. We call this the time of ruin. In the 
discrete-time case, we define this formally as 


T = min{n : U, < 0}. 


The random variable T is different from the other random variables we have encountered since 
it is not necessarily real valued. For any realization for which the surplus is nonnegative at all 
times, the value of T will be co. The set of all such realizations can have positive probability, 
in which case ruin is not certain. We are interested in the probability that ruin will eventually 
occur. This will, of course, depend on the initial surplus u, so we denote this by y(u). That is, 


y(u) = P(T < c|Up = u). 
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Note that y(u) is the probability of eventual ruin, which may seem to be of little interest 
since nobody is planning to gamble or to run an insurance company forever. From a practical 
point of view, one may want to compute the probability of ruin over some finite time horizon. 
That is, one may want the probability that ruin will occur before a fixed time t. We denote this 
by y(u, t). So, 


w(u,t) = P(T € t|Ug = u). 


It is difficult to find general methods for calculating this quantity, and normally each case 
must be treated individually. The following example exhibits some of the possible techniques 
in a simple discrete time example. 


Example 23.1 A special insurance company has a single contract. There can be at most 
one claim, and the probability that a claim does not occur by time f is 1/(1 + £). If a claim 
occurs, the amount is 100 with probability 0.6 or 200 with probability 0.4. Premiums are paid 
continuously at the rate of 20 per year. The company begins with an initial surplus of 60. 
What is the probability of eventual ruin? 


Solution. This is really a finite-time question, since after time 7 ruin cannot occur, for the 
insurer will have collected the maximum claim amount of 200. We break this interval up into 
the relevant time periods. 


(i) From time 0 to time 2, a claim will occur with probability 2/3, and the insurer will 
necessarily be ruined since the initial surplus and premiums collected will be under 
the minimum claim of 100. 


(ii) From time 2 to time 7, a claim will occur with probability dd = 2 and ruin 


will occur only if the claim is for 200. So the probability of ruin in this interval is 


5 1 
0.4 x —=—. 
24 12 
s T 2 l1 3 
The total probability of ruin is therefore 3 + BF 


When G is finite valued, the quantity y(u, t) can be both childishly simple and fiendishly 
difficult to compute. To illustrate this paradoxical statement, consider an example. You flip a 
coin with probability of a head equal to p, and you win 1 for a head and lose 1 for a tail. What 
is v (1, 2)? This is answered immediately since the only way you can be ruined by time 2 is to 
get two tails in a row, and we conclude that y(1,2) = (1 — py. In fact, whenever G is finite 
valued, we can always, in theory, compute w(u, t) by simply looking at all possible paths up 
to time ¢ and seeing which ones lead to ruin. The problem is that if the range of G and the time 
t are large enough, the number of such paths could be enormous, rendering any computation 
infeasible. We therefore need other ways to get information. One such method is to compute 
y(u), which gives an upper bound since 


wu, t) € y(u), for all t. 
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The previous discussion then serves as a motivation for the main theme of this chapter, 
which is to derive methods for calculating the infinite time ruin probability y(u). There are 
several ways to either compute this or estimate it, and we will discuss them in turn. Each is 
useful for certain cases. 


Remark Some authors define ruin as the first time the surplus reaches zero, rather than 
the first time it becomes negative. Let (u) denote the probability of ruin in this case. Then 
(u) = y(u) if either G is continuous or u is not an integer. In the discrete case, if u is a 
positive integer, then (u) = y(u — 1). 


23.2 Afunctional equation approach 


Suppose you sit down to gamble with 300 units of capital. You divide this initial stake into two 
piles, one with 200 and the other with the remaining 100. Now, to be ruined you first have to 
lose the 200 pile, and following that you have be ruined all over again starting with the 100 pile. 
So, it seems reasonable to conclude that y (300) = y (200)y (100), or, more generally, since 
there is nothing special about these particular amounts, that y(u + v) = w(u)y(v). Assuming 
that our reasoning here is correct, we would know already with no calculation at all (and 
ruling out certain degenerate cases) that y(u) is an exponential function. Unfortunately, our 
reasoning is not quite accurate. The problem is that the ruining bet, or claim, by definition, 
will leave us with a deficit. In the original problem, we have to draw on our second pile in 
order to pay for this, and we would not have the full 100 to continue the procedure. Suppose, 
we knew that our deficit at the time of ruin was some d » 0. That is, we would have a surplus 
of —d at ruin, and the appropriate equation would be 


wut v) = yuy — d). 


The following trick allows us to convert this into the form above. Let p(u) = y(u — d); 
then, 


plu t v) 2 w(u t v—d)- y(u- d + v) = y(u- d(v — d) = p(u)p(v). 


This is the same functional equation we encountered in Section 2.6. Assuming that we 
know that ruin probabilities are positive, and assuming some minimal regularity condition, 
such as continuity at one point, we know that p(u) — z" for some z between 0 and 1. Therefore, 


yu) = 24, (23.1) 


Once again, however, we have to question our assumption. Is it ever possible that we 
could know that the deficit at ruin had to be some fixed number d? The answer is ‘not very 
often’, but it does happen in one particular case. Take the discrete-time model, for which the 
values of G are all nonnegative integers except for a single negative value of —1 — as for a 
simple coin flip where G = 1 or —1 — and for which the initial surplus u is a positive integer. 
In this case, the only way to be ruined is to reach a position where your surplus is 0 and 
then to lose 1 in the next period. The deficit at ruin can only be 1. Formula (23.1) would give 
us the ruin probabilities if we could only determine z. We will attempt to do so by using a 
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recursive technique that is a basic tool in ruin theory. Let p denote the probability function of 
G. Suppose you start with an initial surplus of 0. If your gain is —1 in the first period, you are 
immediately ruined. If your gain is k in the first period, your subsequent probability of ruin is 
y (k). Considering all possibilities, we have 


w(0) = p(-D + > D(kyw(k). (23.2) 
k=0 
From (23.1), with d = 1, 
z= p(-1)+ Y pe, 
k=0 
and dividing this equation by z, we can write 
Pg(z) = 1, (23.3) 

where Pg is the p.g.f. of G. 


Example23.2 Consider a game where you win 1 with probability p > 1/2, and lose 1 with 
probability 1 — p. What is y(u)? 


Solution. From (23.3), we have 


l1-p 
Ud 


pz 


There are unfortunately two possible solutions to this, either z = (1 — p)/p or z = 1. We do 
not have a definite answer at this point but can only conclude that either 


1 -p utl 
y(u) — (=) (23.4) 
p 


or 
y (u) = 1, for all u. 


Note that z = 1 is a solution of (23.3) for all G, so our method would seem to have 
accomplished little, leaving us in all cases with a possible conclusion that ruin is certain. We 
will show however in the following sections that we can often rule out this possibility. This 
will be in fact be true in the present example for p > 1/2, and we will then know that the 
probability of ruin is given by (23.4). 


We can already deduce an interesting result in the case that p — 1/2. In that case the only 
root is 1, and we know definitely that ruin is certain, regardless of the initial surplus. (This 
is of course also true if p « 1/2.) This is one of the well-known results in ruin theory. It 
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says that even if you are playing a perfectly fair game, if you play it long enough, you will 
eventually lose all your money. This may been somewhat strange since you would seem to be 
on equal grounds with your opponent (the casino, for example). You are not, however, since 
there is an implicit assumption that the opponent has unlimited resources at its disposal, while 
you only have the u units you started with. We will return to a variation on this problem in 
Example 23.4 below. 

To summarize, this section has achieved only limited success in deducing ruin probabili- 
ties, but it is mainly intended as a motivation for methods to follow. One point that should be 
emphasized is that it illustrates the importance of considering the deficit at time of ruin (given 
that ruin occurs) when trying to deduce ruin probabilities. In the following sections, we will 
refer frequently to this random variable and denote it by D(u) (D for deficit). That is, 


D(u) = -Uj,|(T < œ, Ug = u). 


In the example above D(u) was always 1, but in general it is random and can depend on 
the initial surplus u. 


23.3 The martingale approach to ruin theory 


23.3. Stopping times 


We will motivate the concept discussed here by looking at the gambling situation. Some 
gamblers claim that they can overcome unfavourable odds by a clever 'system'. This often 
takes the form of planning to stop at a certain point. For example, they will continue gambling 
until they have won $100 and then quit. That way, they claim, they are always a winner. Or, 
they will continue to bet on black until the wheel comes up black four times in a row, and 
then quit. They are using what is called a stopping time. Intuitively, a stopping time is a rule 
that tells you when to stop, and it must be such that you know about it when the time occurs. 
In other words, it depends only on the past and not the future. Stopping after four blacks in 
a row is a legitimate stopping time. A rule which says that whenever there are four blacks in 
a row, then you stop after the third one, is not a stopping time since you clearly will not be 
aware of that time when it occurs. (The concept is similar to that we encountered in Section 
20.5 when discussing trading strategies). 

Here is a more formal definition. A stopping time for a discrete-time stochastic process 
is a rule that assigns to each realization (x,) of the process an integer k, the stopping time, in 
such a way that if we assign k to a realization (x, ), and (y, ) is a realization such that x, = y, 
for n = 1,2, ..., k, then we must assign the stopping time k to (y, ) as well. 

To illustrate this definition, take the coin flipping example, starting with an initial surplus 
of 3, winning 1 for a head, and consider a rule which tells you to stop after the first head 
whenever you get two consecutive heads. This should not be a stopping time, and we can see 
that it does not satisfy the definition. A realization of the form (3,4, 5, ...) would be assigned 
1, but a realization of the form (3, 4,3, ...), which agrees with the first one up to time 1, would 
not be assigned 1. 

A stopping time will often be denoted by a letter such as S. A major example that we 
have already encountered is when S equals the time of ruin, which certainly satisfies the 
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requirement for a stopping time. A particularly simple example of a stopping time is S = k, 
for some fixed k. That is, one stops at time k regardless of what has happened. 

A fundamental fact about the fixed stopping time is that for any martingale (X,), as defined 
in Section 18.3, 


E(X,) = E(X9), for all k. (23.5) 
This is intuitively clear. In the case of a Markov chain, it is derived easily from 


E(X,,4) = Y EX |X, = PH, = x) = V PA = x) = EX), 


and by induction we derive (23.5). 


Example 23.3 Consider again the game of flipping a fair coin, winning 1 for heads, starting 
with an initial surplus of 3. The gambler decides to stop at time 4 or whenever two consecutive 
heads come up, if earlier. What is the expected surplus at the end of the game? 


Solution. This is complicated by the fact that we no longer have a Markov chain. There is, 
however, a useful general technique that allows us to recover the Markov property by adding 
states. In this example, instead of having a single state for each integer w, we insert a state wu 
to signify that there was a win on the previous play, following a loss on the play before; and 
we insert a state wd to signify a loss on the previous play. See Figure 23.1, where the shaded 
boxes indicate points where play stops. By counting paths, we calculate the expected surplus 
at stopping as 


1 1 1 4 4 1 
EX)=5x1+4xl1+5x L +3x4+4-L=3 
Kg me PU TT T TT 


The stopping rule has not helped to raise the expectation above the initial stake. 


5 
5 
4u 
4 
L4 3d 
3 
3 2d 
Bw 3u 
2d 4 


ld EN 2u 
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Figure 23.1 Tree for Example 23.3 
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23.3.2 The optional stopping theorem and its consequences 


This motivates the question of whether it is always true in the case of a martingale that 
E(Xs) = E(X9)? That is, does (23.5) hold when the fixed time k is replaced by an arbitrary 
stopping time? The answer is no. Starting with a positive initial surplus of u, we flip a fair 
coin repeatedly, wagering any amount b that we choose, and receiving back 2b for a head and 
nothing for a tail. There are two situations we want to present. In the first, we bet 1 unit each 
time and continue until we lose all of our initial stake. The stopping time S is the first time the 
surplus reaches 0. Trivially, E(X,) = 0 z u. The second case is the familiar doubling strategy. 
Bet 1 unit on the first toss, and double the bet each successive play. S is the first time a head 
comes up, and it is not hard to see that E(X;) is u + 1, so we are sure to gain 1. The problem is 
that neither of these strategies are feasible in practice. (Of course the first is irrational as well.) 
In both cases, the amount of time we need is unbounded, and in the second case the amount 
of capital we need is unbounded as well. We will, however, get an affirmative answer to our 
question if there exists a suitable bound on a combination of the stopping time and values. 
The following theorem gives a precise condition. Observe first that in place of Xs, which is 
not defined if S = œ, we want in general to consider X5|S < oo. 


Theorem 23.1 (Optional stopping theorem) Suppose that {X,,} is a martingale and S is 
stopping time such that 


lim E(X,|S > n)P(S > n) = 0. (23.6) 
Then 
E(Xo) = E(X5|S < oo)P(S < oo). 
Proof. For any n, 
E(X9) = E(X,) = E(X,|S € n)P(S € n) + E(X,|S > n)P(S > n). (23.7) 
Using a modification of Equation (A.29), the third term above can be written as 


2; E(X,|S = I)P(S = k). 
k=0 


In view of the martingale property, 
E(X,|S = k) = E(X, |S = k) = E(X,|]S = k). 


The entire third term therefore reduces to E(X,|S < n)P(S < n). We now simply take limits 
as n — oo to reach the conclusion. 


Corollary  /n the case where S is finite valued, we have E(X,) = E(X4) in either of the 
following cases: 


(i) S is bounded. That is, for some N > 0, we have S € N. 


(ii) The values of X, are bounded in absolute value prior to stopping. That is, there is a 
constant C such that, for all n, if S > n then |X,| < C. 
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Proof. In (i), the second factor in (23.6) is 0 for n > N, so (23.6) necessarily holds. In (ii), the 
second factor in (23.6) approaches 0 by the fact that S is finite, and the first factor is bounded 
in absolute value by C, so the product converges to 0. El 


Example 23.4 A gambler starting with an initial fortune of a units repeatedly plays an even 
money game against an adversary with an initial fortune of b. Each has an equal chance of 
winning each game. They continue the play until one is broke. What is the probability that 
the gambler with a units will eventually lose his initial stake before the other does? (As an 
equivalent formulation, we can remove any restriction from the opponents’ initial stake and 
instead postulate that the gambler decides to quit upon losing a or winning b.) 


Solution. In Example 23.2 above, we essentially considered the case that b was infinite, that 
is there was no restriction on how much the opponent might lose, and we saw that ruin was 
certain. Here, we deal with the more realistic case that b is finite. Even a casino has some 
upper bound on its available wealth. Let U, be the fortune of our gambler at time k. In view of 
the fairness of the game, this is indeed a martingale. Let S be the time that the gambler either 
loses the initial stake of a or wins b from the opponent. This is a stopping time that certainly 
satisfies condition (ii) of the Corollary to Theorem 23.1 since the values of U, range from 0 to 
a + b. We will show later that S must take a finite value, which allows us to use the Corollary. 
Let z be the probability that our gambler loses his initial stake. We can invoke the Corollary 
to conclude that E(Us;) = a. But also, considering the two possibilities for S, we have 


E(Us) = (a + b)(1 — x) + Or, 
and by equating, we obtain 


b 
r= : 
a+b 


Note that z approaches | as b approaches oo, verifying our conclusion following Example 23.2. 


Example 23.5 Redo Example 23.4 assuming now that the probability of a win is p # 1/2. 
(This is a classical problem, often referred to as gambler’s ruin.) 


Solution. The difficulty now is that we no longer have a martingale. However, the following 
ingenious trick allows us to transform the process into one. To simplify notation, let q = 1 — p. 
Consider the process 


For any n, U,,, = Un + G,,,, where G, takes the value 1 with probability p and —1 with 
probability q. Therefore, 
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leading to 


: = 
Gri 
-E k (2) | . (23.8)4 
p 


Now, invoking the independence of U, and G,,,, 


q Gia 
E(X, ,4]X, = x) = xE (2) =x, 
p 


showing that {X,,} is a martingale. We can now duplicate the calculations in Example 23.4, 
applied to X,. We have that 


E(Xs) = EX0) = (2) 
P 


and 


a+b 0 
gx) - a - (f) «s(£) l 
p p 


_ (G/pY — (q/py” 
1- (q/p c 


Solving, 


We return to the unfinished business of showing that the stopping time S of the last 
two examples must assume a finite value. We have a finite-state Markov Chain with states 
0, 1,2, ..., a + b, corresponding to the amount held by the person who started with a. We can 
see, similarly to Example 18.3, that the states 1,2,...,a + b — 1 are transient and the states 
0 and a+ b are both absorbing and therefore recurrent. By Theorem 18.2 the process must 
reach one of these two recurrent states, implying that S cannot take a value of co. 

Here is another striking application of this idea. 


Example 23.6 A gambler plays a game in which she will either win 1000 with probability 
p. where p « 1, or lose 1 with probability 1 — p. Suppose, however, that whenever she accu- 
mulates more than 10 000 in winnings, a companion takes everything in excess of 10 000 
to spend in the casino gift shop. What is the probability that the gambler will eventually go 
broke? 


Solution. The probability is 1. Regardless of the initial fortune or the value of p, the gambler 
in this case is sure to lose everything if she plays for a sufficiently long time. Once again we 
have a finite Markov chain with states taking values from 0 to 10 000. All states except 0 are 
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transient, as we showed in the previous example, and we are certain to reach the one recurrent 
state of 0. 


Remark The key to the above example is the phrase ‘a sufficiently long time’. As a practical 
matter, most people would be happy to play this game for a high value of p and would expect 
to eventually emerge a winner. 


23.3.3 The adjustment coefficient 


We now wish to apply Theorem 23.1 to the discrete surplus process (18.3) as interpreted in 
Section 23.1. We cannot, however, expect this to be a martingale. It will only be one if E(G) = 
0, that is, if c = E(N)E(X). This is unrealistic, since, as we noted in previous chapters, insurers 
will invariably charge an amount above this expected value to guard against the possibility 
that the aggregate claims will be higher than expected. That is, they take c = (1 + O)E(N)E(X) 
for some 0 > 0. However, we can transform this process to be a martingale by the same type 
of procedure as we used in Example 23.4. The following definition gives the basic tool for 
doing this. 


Definition 23.1 An adjustment coefficient of a random variable G is a positive number R 
satisfying 


M,(-R) = 1. (23.9) 


If R is an adjustment coefficient of a discrete random variable, then for z = e it follows 
that P¢(z) = 1, so we have already essentially seen this idea in (23.3). 

To justify the word ‘the’ in the title of this subsection, we will show that there cannot be 
two positive numbers satisfying (23.9). Let y be the supremum of all points for which MG (—7r) 
is defined. (For example, if G = c — W where W ~ Exp(f), then y will simply be equal to f.) 
In many cases y = oo. Define a function $(r) = Mg(-r)- 1 = E(e^'8) — 1 on the interval 
[0, y). Then 


$(m)-2-E(Ge"9) p" (r) = E(G’e"®) > 0, 


so $ is a convex function that takes the value 0 at the point 0 and, therefore, cannot have more 
than one positive root (see Figure 23.2). 

An equally pertinent problem is to decide if a positive root of $ exists. We can show that 
this will almost always happen in view of the following two properties that G will satisfy in 
any realistic insurance context: 


(i) E(G) > 0; 
(i) P(G « 0) » 0. 


As we mentioned above, (i) will hold due to the relative risk loading. Moreover, there 
must be the possibility of paying out more in claims than the premiums collected, or nobody 
would ever buy insurance. This will imply (ii). 
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o(r) 
n 


Figure 23.2 Graph of the function $ 


Now (i) implies that ¢’(0) = —E(G) < 0, so $ will start out negative. We must, therefore, 
have a positive root as long as lim, ,, Mg(—r) > 1. From condition (ii), we can find a positive 
numbers s and ô such that P(G < —s) > ô. This means that E(e ^O) > e” > 1 for sufficiently 
large r. This guarantees a positive root whenever y = oo (for example, when G is finite). 

There are a few rare examples, which we will not give here, where y is finite but 
lim, Mg(-r) < 1, so the adjustment coefficient will not exist. (This is aside from those 
cases where the m.g.f. does not exist in the first place, so we cannot even define the adjust- 
ment coefficient.) 

It is sometimes convenient to extend the definition of the adjustment coefficient to the 
extreme cases. If @(f) < 0 for all t, we say that R = oo; while if $(t) > 0 for all t, we say that 
R=0. 

The adjustment coefficient can be taken as a measure of safety. The higher R is, the less 
risk there is for the insurer. This is indicated by Theorem 23.4 below, which shows that as R 
increases, our upper bound for the probability of ruin decreases. This principle is also seen in 
Examples 23.2 and 23.5. These examples show that for the case where G takes the value 1 with 
probability p and —1 with probability 1 — p, the adjustment coefficient is log(p) — log(1 — p), 
which increases as p does. The following is yet another example. 


Example 23.7 Find the adjustment coefficient for a normal distribution with mean 4 and 


variance o?. 


Solution. The m.g.f. of the normal is given in (A.52). We have e~R#+o"R’/2 = 1, so that 
—Ry + o?R? /2 = 0, and since R > 0, 


_ 2H 


R : 
o? 


(23.10) 


We see that for two normals with the same mean, as the variance gets smaller, which should 
signify less variation and therefore less risk, the adjustment coefficient goes up. 
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Equation (23.10) is often used as an approximation to the adjustment coefficient for 
other distributions. This is justified by the following argument. The function log(M,(t)) has 
first derivative equal to M. cO / Mg(t), which at t = 0 equals E(G). Differentiating this latter 
expression yields that the second derivative at 0 is equal to Var(G). We can then write (23.9) 
in the form 


R > 
0 = log Mg(-R) = -Ryu + ar” EN 


and ignoring powers of R higher than 2 gives (23.10). 
The following example will be extensively used in the continuous-time models. 


Example 23.8 Suppose that G = c — (N, X), where N ~ Poisson (A). Find an expression 
for R in terms of X and å. 


Solution. Using (A.50) and (A.51), MG (-r) = e ^" Muy x (r), and then using (A.42) and 
(21.4), this is equal to e^ ^" e^(Mx()-D = e-cr*A(Mx()-D. For this to equal 1, the exponent must 
equal 0, and dividing by 4, we get 


My(R) 214 (<)R. (23.11) 
Since c = (1 + O)AE(X), where 0 is the relative risk loading, we can also write 


My(R) = 1-4 (+ OER. (23.12) 


23.3.4 The main conclusions 

A key fact about the adjustment coefficient is that the process X, = e~*4n is a martingale. 
This follows by exactly the same calculation as given in Example 23.5 where we used the 
fact that E(e- A8) = 1. We now obtain a major conclusion by applying Theorem 23.1 to this 
martingale. We must first verify that condition (23.6) holds. 


Theorem 23.2 For the process given in (18.3), suppose that E(G) > 0 and that Var(G) 
exists. Then, for any positive r, 


E(e"^|T > nP(T»n > 0 as n > oo. 
Proof. Let u be the mean and let ø be standard deviation of G, so that 
E(U,) 2 u ny, Var(U,) — no?. 
Consider the following events referring to the situation at time n. Let A, be the event that 
T >n and U, > E(U,)/2, and let B, be the event that T > n and 0 € U, < E(U, /2). Since 


T > n means that U, must be nonnegative, we see that the event T > n is a disjoint union of 
A, and B,- 
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We also know from Chebyshev’s inequality (A.11) that 


4Var(U,,) 2 4no? 


MOETE "unt 


so that P(B,) > O0 as n — oo. Then, 
E(e ""|T > n)P(T > n) = E(e ""|A,)P(A,) + E(e ""»|B,)P(B, ). 


As n goes to co, the second term approaches zero because the first factor is bounded by 1, 
and the first term is less than e^" *"//2? and also approaches 0. 

We can now apply Theorem 23.1 to the martingale e^ ^U» 
conclude that 


with the stopping time T to 


E(e FUTT < co)y(u) = e^". 


While stopping theorems are normally used to derive the expectation at time of stopping, 
in this instance we get an expression for the probability of ruin. Note that —-U;|T < œ is just 
the deficit at ruin, which we termed D(u) before, so we have the following theorem. 


Theorem 23.3 If the adjustment coefficient R exists, then 


e ku 


y(u) = Eje] 


Note that in the case that D(w) has a constant value of 1, we recover (23.1) with z = e^. 

In general, this theorem does not give us the exact value of y(u) since we will not know 
the denominator. We can say, however, that since R is positive, the denominator is greater than 
1, and we conclude the following. 


Theorem 23.4 Ifthe adjustment coefficient R exists, we have 
y (u) < e P" 
and therefore 
lim y(u) = 0. 
uoo 
This theorem tells us that we can make the probability of ruin as small as we like by taking 
the initial surplus sufficiently high, which is certainly a reasonable conclusion. It also tells us 
that the probability of ruin reduces exponentially as a function of initial surplus. For example, 


if the upper bound to the probability of ruin given by Theorem 23.4 is less than 0.05, and we 
double the initial surplus, we then know that the probability of ruin is less than 0.0025. 
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It is also sometimes useful to have a lower bound for the ruin probability. This is possible 
if G is bounded below by —M for some M > 0. We then know that D(u) < M, and we can 
conclude from Theorem 23.3 that 


e RUtM) < y(u). 


We conclude this section with some remarks on what happens in the rare case that the 
adjustment coefficient does not exist. We can still reach the second conclusion in Theorem 
23.4. As long as E(G) > 0 and Mg is defined on some interval of positive length about 0, 
we can find f > 0 such that M; (—f) < 1. We then have that e^ PU, is a supermartingale. We 
obtain a version of (23.7) with > replacing the first equality sign, and this is enough to derive 
the conclusion of Theorem 23.4 with f replacing R. 


23.4 Distribution of the deficit at ruin 


What can we say about the denominator in the expression for the ruin probability given in 
Theorem 23.3, other than that it is greater than 1? Can we ever evaluate it exactly in cases 
other than that mentioned in Section 23.1, where the values of G are nonnegative integers 
except for a single negative value of —1? In this section, we will try to provide some insight 
into these questions. To do so, we introduce some additional random variables. 

Let Y = —G|G < 0. For example, if G takes values (4,3, —1, -2, —3] with respective 
probabilities (0.4, 0.1, 0.1, 0.25, 0.15), then Y will take the values {1,2,3} with respective 
probabilities (0.2, 0.5, 0.3). The significance of Y is that only the negative values of G can 
bring about ruin. 

Let J(u) denote the value of the surplus in the period prior to ruin, assuming an initial 
surplus of u (the J stands for ‘just before’). The connection between all these is that D(u) will 
be equal to some value of Y — J(u). (In what follows, we will just write D and J, suppressing 
u, Which will be fixed.) 

To illustrate, take G as given above, and suppose that u = 2. Given a realization G, = 1, 
G = —1, G4 = —3,..., ruin will occur at time 3, J will equal 2, and D will equal 1. For a 
realization G} = 1, G) = —1, G4 = —2, G4 = —3, ..., ruin will take place at time 4, J = 0, 
and D = 3. 

We can make the following observations about this example. J can take possible values 
of 0, 1, or 2. When J takes the value 2, then D will take the value 1 with certainty. When J 
takes the value of 0, then any of the three negative amounts will cause ruin, so D will have 
the same distribution as Y. When J takes a value of 1, then the ruining claim must be either 2 
or 3, so D will take the value 1 with probability 5/8, and 2 with probability 3/8, as these are 
the conditional probabilities of values of Y given that Y > 1. 

In the general case, if J = j, then we know that the ruining claim Y must take a value 
greater than j, and in particular for D to take a value of d, we need that Y = j + d. This gives 


fy(d tj) 


d:—12;.: 
sy() 


P(D =d)= Y PU z j) 
" 
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If only we knew the distribution of J, this would give us the distribution of D. Unfortunately, 
deducing the distribution of J is just as difficult as getting that of D, so we would appear to 
have simply gone around in circles, with no gain of information. 

This is not quite true, however. For one thing, our analysis indicates some features about 
the distribution of D. It is somewhat related to that of Y. It takes exactly the same values, 
and it will in fact have the same distribution as Y whenever J = 0. (In our simple example of 
Section 23.2 we knew in fact that J was always 0.) In general, however, it will involve more 
mass at the lower values than Y, for when J takes values higher than 0, there is less chance 
for D to assume higher values. 

A second observation is that there is one case where we can indeed use the formula above. 
Suppose that Y ~ Geom(p)*, using the notation we introduced at the beginning of Section 
21.7. For this distribution, fy(d + j)/sy(j) = (1 — p)p*/-! /p = fy(d). The second factor in 
the summation is independent of j, so it can be factored out, and we conclude that 


P(D = d) = fy(d) Y) PU = i) = fy(d). 
" 


In other words, we have the following: 
If Y ~ Geom(p)*, then for all u, D(u) ~ Geom(p)*. (23.13) 


We will use this example later to motivate a result in the continuous-time case. 


23.5 Recursion formulas 


In some cases, we can use recursion to calculate exact ruin probabilities as well as the exact 
distribution of surplus at time of ruin. 


23.5.1 Calculating ruin probabilities 
Suppose we have the insurance claims model 
G-c-(N,X), 
and that the following restrictions hold: 
G) c=1. 
(ii) N takes the values 0 or | with probabilities 1 — q or q, respectively. 


(iii) X takes positive integer values 1,2, ... , K, with probabilities f (1), f (2), ..., f (K), respec- 
tively. 


(iv) 1 > gE(X), so that E(G) > 0. 
(v) The initial surplus u is a positive integer. 


Note that (i) and (iii) simply say that all claim amounts are integer multiples of the premium, 
since we can always take the amount of the premium as the unit of capital. 


RECURSION FORMULAS 435 


To simplify the notation, we will first illustrate with K = 3. Clearly, the following calcula- 
tions will work for any value of K. Note that G takes the values 1,0, —1, —2 with probabilities 
1—4.qf(1). qf 2), and qf (3), respectively. 

Suppose we start with u units at time O. In the first period, the four possible outcomes for 
G lead to four possible values of surplus at time 1. This gives us the following set of equations, 
one for each value of u: 


(0) = A — gy) + af Dw) + af 2)w C- D) + gf (3)w (72) 
v (1) = — qw) + af Dw) + gfw O) + qf )w C71) 


wu) = 1 -gyu + 1) + af (D) Qo + qf Q)u(u — 1) + af (3)w(u — 2) 


Note that for convenience we have included terms of the form y(i) for i < 0. These, of course, 
are just equal to 1. If we start with a negative amount, we are already ruined. Next, rearrange 
the equations above to give 


y (0) — v(I) = q[-v(1) + f(Dv() + fw C- D - f()w C-2)] 
y (1) - wQ) = q[-w (2) + fw) + fQ)v (0) - fG)wC- 1] 
v (2) = w(3) = q[-v 3) + fy Q2) + fw) *- f(G)v (0)] 
v(3) - v(4) = ql-v(4) * fw) * f Q)wQ) * f G)v(Q)] 


y (n) — wnt 1) = q[-w(n + 1) * f(D)w(n) t f2)w(n - 1) * f()v(n - 2)] 


Sum the first n + 1 of these equations. The left-hand side adds up to w(0) — y(n + 1). 
To add the right-hand side, it is convenient to add by the diagonals (running northwest to 
southeast) because they involve the term y(k) for the same value of k. Since f(1) + f(2) + 
f (3) = 1, all of the diagonals will sum to zero except for the three on the upper right and the 
three on the lower left. The third diagonal on the upper right sums to qu(0). The first two 
diagonals on the upper right sum to 


G — DFO) = qE( — 1). 


3 
q 


i=1 


The three diagonals on the lower left involve terms in y(n + 1), y (n), and y(n — 1). These 
will all converge to 0 as n approaches œ by Theorem 23.4. Taking limits gives 


y (0) = q[v(0) + EX — 1)], 
and solving 


w(0) = —— EQ — 1). (23.14) 
1-4 
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It is instructive to write this in terms of 6, the relative risk loading. Since 


1 
= —., (23.15) 
gE(X) 
we can substitute for E(X) in (23.14) to obtain 
1/(1 4-0) — 
y(0) = Aen 
—q 
which tells us that 
lim y(0) = —— (23.16) 
q>0 v = 1+0 i 


Now consider the general case with G taking the K values 1,2, ..., K. Formula (23.14) 
gives us a starting value, and all of the ruin probabilities can be calculated recursively by 
rearranging our first set of equations to get 


1 
y (n) = fq —)-dfOwa-)D+fQwn-2)+--+f(Kywn— K))), | 23.17) 
which we can write more compactly, using the notation of Section A.12.3, as 
1 
y(n) = Tg" -1)- qf * wl. 


Remark The quantity v(0) is of importance since it gives us our starting value. One may 
think, however, that it is intrinsically of little interest, since we are not likely in practice to 
have a situation where the initial surplus is 0. It is, however, of great significance since given 
any starting value u, we can interpret y(0) as giving the probability that we will eventually 
reach some point where our surplus is less than u. Similarly, D(0) is the amount by which we 
are less than the initial surplus, if this occurs. We will exploit this idea to great advantage in 
Section 23.7. 


23.5.2 The distribution of D(u) 


The same approach lets us deduce the distribution of D(u). For k = 1,2,...,K — 1, let y;,(u) 
be the probability that, starting with initial surplus u, ruin eventually occurs and the value of 
D(u) = k. It will be convenient again to consider negative values of the argument, and we note 
that 


0,  ifrzk, 
uen E ifr-k 
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Given values of y,(u), we can immediately find y(u), as well as the distribution of D(u), 
since 


K-1 
yw) = J, wu) (23.18) 
k=1 
and 
PiIpw = k] = V9. =1,2,...,K—1. (23.19) 
y(u) 


We calculate y;,(u) by following exactly the procedure in the previous subsection. We get 
the same systems of equations except with y replaced by w. When we sum the second set 
of equations and take limits, we again will have everything on the right-hand side vanishing, 
except possibly for a finite number of diagonals on the upper right. Since these all involve 
negative values of the argument of y, these sums will also be zero except for the single diagonal 
involving the terms yw, (—K), and the sum of that will be f(k 4- 1) t- f(k - 2) - -- -f(K) = 
P(X > k). So, in place of (23.14), we get 


w,(0) = DUX >k). (23.20) 


From (23.14), (23.19), and (23.20), we immediately have a nice simple formula for the 
distribution of D(0): 


sx(k) 


P[D(0) = k] = EX- D 


usd UA. (23.21) 


This verifies our intuition of the previous section. In our present case, we have Y = X — 1, 
and we see precisely how D(0) is a type of ‘shifted to the left’ version of Y. For example, if 
Y takes values 1, 2, 3, with probabilities 0.2, 0.5, 0.3 respectively, then D(0) takes the values 
1,2,3 with probabilities that are in the ratio, 1 : 0.8 : 0.3, so they are 10/21, 8/21, 3/21. 

We now obtain the same recursion formula as in (23.17), except with y replaced by y, 
that is, 


yi) = EL = D -gf Dy — 1) * fw, (n — 2) + + fü) — K)1] 


> Er - D - q(f * wn]. 


Example 23.9 q= 1/15, X has a constant value of 4. Calculate v,(1), w2(1), w3(1), and 
y (1). Verify that your answer to the last agrees with that given by Theorem 23.3. 


Solution. We note first that f(1) = f(2) = f(3) = 0,f(4) = 1. So 


1 1 15/1 15 
nO= z> Ds; w0-1 ( -0) 
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Similarly, 
1 15 
a {je 2: 
w»(0) TU yw(1) 196’ 
1 15/1 1 1 
0 eA A. 1 =e (= — =) — ——s 
wO c. vs) = 73 a7 15) 196 


so that y (1) = 31/196, and D(1) takes the values 1,2,3 with probabilities 15/31, 15/31, 1/31, 
respectively. (Note that D(1) differs from D(0), which has a uniform distribution on 1, 2, 3.) 

To check this by Theorem 23.3, we first solve for R, or equivalently z = e^. Since G 
takes the value 1 with probability 14/15 or —3 with probability 1/15, we have 


l4z +z = 15, 


and we can verify that z 2 1/2. So, 


From Theorem 23.3, 


1/2 31 


1) = —— = — 
vD = oe 731 ^ 196' 


as above. 


23.6 The compound Poisson surplus process 


23.6.1 Description of the process 


We now turn to ruin calculations in the continuous case. This will be based on a compound 
Poisson process which is simply a process corresponding to the compound Poisson distribu- 
tion, which we considered in Chapter 21. That is, instead of merely counting 1 every time an 
event occurs, we take an observation from some given distribution. So we have a distribution 
X, which we can think of as a severity distribution, and we have a Poisson process, N(t). The 
resulting compound Poisson process is given by 


Nit) 


Sa) = È, X. 
k=1 


where (X,) are independent, each has the same distribution as X, and they are independent 
of N(t). In other words, we can simply write 


SO ~ (NO), X). 
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We can now provide a model for a surplus process. Suppose that an insurer’s aggregate 
claims up to time f are given by a compound Poisson process S(t), as above. In the discrete 
case, we postulated a premium of c per period. We now assume that the insurer collects 
premiums at a continuous rate of c per period, so in any period of length / total premiums of 
ch will be collected. We assume also that the insurer begins with an initial surplus of u. The 
compound Poisson surplus process is the process given by 


U(t) = u* ct — S(t), (23.22) 


where U(t) is the surplus at time f. Our relative risk loading is given as in the discrete case by 


c 
AE(X) 


1+0= 


The probability of ruin is defined similarly to the definition in the discrete case, although 
we need an infimum to replace the minimum. That is, 


T =inf{t: U(t) « 0], y(u) = P(T < c|Up = u). 


See Figure 23.3 for a typical realization of this process. The diagonal lines have slope c, 
and show the increase in surplus arising from the premium payments. Downward jumps then 
occur whenever there is a claim. 

We can approach this case by considering an approximating discrete model. Suppose that 
we divide up our time into periods of length h for some small h. Then, if we only view our 
surplus at the end of each period, we just have a discrete surplus model with the gain in each 
period given by 


G= ch — (N, X), 


U(t) 
A 


»1 


Figure 23.3 A realization of the continuous-time surplus process 
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where N ~ Poisson(Ah). We will now go over the various results we obtained in the last two 
sections to see how they are modified for the continuous model. In many instances, we do not 
give rigorous proofs. However, all the results are motivated by those of the previous sections. 


23.6.2 The probability of eventual ruin 


We first would like to calculate the adjustment coefficient in the continuous model. It is natural 
to do this by calculating the adjustment coefficient of G for the approximating discrete model 
as given above, and then taking the limit as / approaches 0. This turns out to be extremely 
simple in the Poisson case. The term c/4 that we get in (23.11) is independent of h, so we get 
exactly the same answer as before. The adjustment coefficient is given by (23.11) or (23.12), 
as it was in the discrete model. The probability of ruin is now given by Theorems 23.3 and 
23.4, which carry over unaltered to the surplus model with compound Poisson aggregate 
claims. 


23.6.3 The value of y(0) 


In light of the Poisson assumption, and Theorem 18.3, we can view our continuous-time 
model as a limiting case of the particular model discussed in Section 23.5. As h goes to 0, so 
will q and we deduce, directly from (23.16), that 


1 
VE ED 


This is somewhat surprising, since y(0) does not depend on the particular distribution of 
X, except through the expectation E(X), which affects 0. In other words, given any two 
distributions for X, as long as they have the same mean, they will produce the same value 
of y (0). 


23.6.4 The distribution of D(0) 


D(O) is necessarily continuous here in view of the fact that premiums are collected continu- 
ously. We can deduce, analogously to (23.21), that if fpo) is the density function of D(0), then 


Sy(d) 
Spo) (d) = XD. (23.23) 


For later purposes, we will need the m.g.f. of this distribution. Integrating by parts, 


"a sy(De"dr = seo + LS goera = Lumo) — 1), 
0 r lo r Jo r 


so that 


My(r)- 1 
Mpo) = ae (23.24) 
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23.6.5 The case when X is exponentially distributed 


It is easy to deduce from the distribution of D(0) given above that when X ~ Exp(f), then also 
D(0) ~ Exp(£). However, much more than this is true. As a continuous analogue to (23.13), 
we obtain the following: 


When X ~ Exp(f), then D(u) ~ Exp(f) for all u > 0. 
The last observation tells us that we can get an exact calculation of the ruin probability 


when X is exponential. We first must calculate the adjustment coefficient. From (23.12), we 
solve 


TEOR NES EE R 
B — B-R P-R 
and it is immediate that 
r= 2e 
1+0 
Substituting in (23.12) gives 
My(R) = 148. (23.25) 


Let us verify that our formula for R makes sense intuitively, recalling that our adjustment 
coefficient is a measure of safety. It increases as fj increases, which it should since the mean 
of the severity distribution is decreasing. It must be 0 in the extreme case that 0 = 0, and it 
must increase with 0. 

Note now that the denominator in Theorem 23.3 is just Mp; (R). When X ~ Exp(f), D(u) 
has the same distribution, and from (23.25) we know the denominator is 1 + 0. We substitute 
in the statement of Theorem 20.3 to obtain a key result. 


Theorem 23.5 For the surplus model, with compound Poisson aggregate claims, where the 
severity distribution X ~ Exp(f), 


— 1l ,.-upf0/(0-0) 
Ug 


23.7 The maximal aggregate loss 

We continue with the same continuous model as in the previous section and give an alternate 
approach to ruin probabilities. This will provide another proof of Theorem 23.5, as well as 
enabling us to deduce the ruin probability when X is a mixture of gamma distributions. 


Definition 23.2 The maximal aggregate loss, denoted by £, is the largest amount by which 
our surplus will be less than the beginning surplus. In other words, 


L£ = max(u — U(t) : t > 0). 
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For example, suppose we start out with an initial surplus of 10. Then, given a realization 
in which the smallest value of surplus ever attained is —7, the maximal aggregate loss will be 
17. It is clear that £ is independent of u. In fact, we could write it as 


L = max(S(t) — ct: t> 0). 
The significance of this random variable for ruin models is that 
w(u) = P(£ > u). (23.26) 


Therefore, if we know the distribution of £, we can immediately write down the probability 
of ruin. We will show in this section that for our compound Poisson model, we can find the 
m.g.f. of £, which makes a significant step towards this goal. 

To do this, we consider what is often termed as the record low process. Imagine we are 
observing the surplus process and each time that we reach a new record low point in surplus, 
we record the amount by which we ‘beat the record’, that is, how much we are below the 
previous record low. Let £, be the amount we record on the nth occasion the record is broken, 
should this occur. 

As an example, suppose u = 10 and c = 1. If we have a claim of size 5 at time 3, we have 
a new record low surplus of 8, so £, = 2. Suppose the next claim is 3 at time 7. Our surplus 
is 9 at that time, so we do not have a record low. Then, suppose the next claim is 5 at time 8. 
This gives us a new record low of 5, so we have £, = 3, and so on. We may of course have 
many more record lows, but suppose that we do not and a surplus of 5 is as low as we get. 
Then the maximal aggregate loss is just 5, which is the sum of the two record low increments, 
that is, £ = £4 + £5. 

Observe that the random variable £, is just D(0), as is clear from our remarks in the 
previous section. Similarly, it follows from the stationarity of the process that each £„ will 
have the same distribution. How many new record lows will we encounter? Well, after any 
new record low has occurred, the probability of another occurrence is just y (0) = 1/(1 + 0). If 
N is the number of records lows that occur, then N ~ Geom(1/(1 + 0)). Now £, the maximal 
aggregate loss, is simply the sum of the L,. So, we have 


DSi EU Ly. 


By the independence in the process, this shows that £ can be expressed as a compound 
distribution (N, D(0)), where N is geometric. We have already worked out the m.g.f. for a 
compound geometric distribution in Example 21.2. As we noted in that example, in order to 
have a chance of recognizing a compound distribution from its m.g.f. we will want to get rid 
of the point mass at 0 and look at £* = £|£ > 0. From this example, 


Mpo (r) ) 


Me+(r) = (1 — p) A EUN 


where p = 1/(1 + 0). Substituting from (23.24) for Mp) and simplifying gives us 


= (My) — 1] 
MeO 7 GE OER — IMs T Poen 
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Since £ is a mixture of 0 and £+ with weights 0/(1 + 0) and 1/(1 + 0), respectively, we know 
that 


P(£-u- PO >u)t+ het > u). 


The first term in (23.27) is trivially equal to 0 for nonnegative u, yielding the key result 


-l + 
y(u) = i qi E > u). (23.28) 


We summarize the whole procedure of finding ruin probabilities by the maximal aggregate 
loss procedure: 


1. Calculate My(r) and use (23.27) to find M e. (r). 
2. Try to deduce from this the survival function of £*. 
3. Calculate y(u) directly from (23.28). 


Step 2 is of course the difficult one. Will we actually be able to recognize the distribution 
of £* from its m.g.f.? Here is one particularly easy example. 


Example 23.10 Use the method described above to find y(u) when X ~ Exp(f). 
Solution. Example 17.2 showed that £* ~ Exp(f0/(1 + 0)), and from (23.28), 


— 1l .-upfe/(Q-0) 
y(u) E 1+ O° > 


giving us an alternate proof of Theorem 18.3. 


Are there any other cases where we can recognize the distribution of £* from its m.g.f.? 
We make two key observations. First, from (23.27), if My(r) is a rational function (a quotient 
of two polynomials), then so is Mp+(r). Second, if X is a mixture of gamma distributions 
that have an integer for the first parameter, its m.g.f. will be a linear combination of rational 
functions, therefore rational, and hence M, (r) will be rational. So, the steps to handle such 
X are as follows. For such a distribution X, calculate M;+(r) and decompose it into partial 
fractions. Suppose 


Mes = Yow, (=) , 


i-i 


This shows that £* is a mixture of (Y,, Y>,..., Y,) with weights (wj, w2, ... , w,), where for 
each i, Y; ~ Gamma (ai, r;). From (23.28), we deduce that 


l 


ww) = (5) È "PO, > w). 


n 
i=1 
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Example 23.11 Find y(u), given that X has the density function 
1 =x —2x 
f(x) ==e*+e 
2 
and 0 = 7/9. 


Solution. X is a mixture of Exp(1) and Exp(2) distributions with equal weights, so 


1/ 1 UP 2 3r - 25 
M -1=>(—_) = ( )-1-— =. 
x) a up 2 xm J- 6r+2r 


As a check, this quantity must take the value 0 for r = 0. Substituting My(r) — 1 into (23.27) 
gives 


Muy D een eee enh a 
Lr 9\7-18r+8r) 90-21- 4r) 


The partial fraction expression for this is 


1 uc) 
Ison) 18 7-249 


So £* is a mixture of Exp(1/2) and Exp(7/4) distributions with weights 14/15 and 1/15, 
respectively. It follows that 


ae 


14 osu, 1 tes 
15 15 


y= =| 


What happened to our adjustment coefficient? It played a prominent role in our previous 
method but seems to be absent here. It is not, however. The denominator of (23.27) is precisely 
the quantity we set equal to 0 to find R. To find the partial fraction decomposition, we must 
find the roots of this equation, and the adjustment coefficient appears as the smallest such root. 
In Example 20.11, we found the two roots r = 1/2 and r = 7/4. We then know that R = 1/2. 

How do we reconcile the other root of 7/4 with our previous statement that the adjustment 
coefficient was uniquely defined as the only positive root of the defining equation? The point 
is that My(r) = E(e'*) may only be defined over a certain region. In this example, it is defined 
only for 0 € r « 1, and if r > 1, then the expectation will not exist. However, the algebraic 
expression that gives the m.g.f. over this range does make sense for values greater than 1. It 
just no longer represents an expectation of a function of X. The resulting algebraic expression 
can have other roots, as it does in this example. 

The procedure followed in the previous example can be used to write a general closed- 
form formula for w(u) when X is a mixture of two exponential distributions, as given by the 
following theorem. We leave the details of the proof to the reader. 


Theorem 23.6 Suppose that in the compound Poisson surplus process, X is a mixture of 
Exp(a) and Exp(ka), for some k > 1, with respective weights w and 1 — w. 
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(a) Let o = 1 — w + wk. The adjustment coefficient is ra, where r is in the interval (0,1) 
and is a solution to the quadratic equation 


p- = @ 
(—n(k—-r) REO 
(b) Let 


-k2 
~ r(140) 


Then £* is a mixture of Exp(ra) and Exp(sa) with respective weights 


Gar) pt Ga) 
— and —. 
s—-r/q s—r/q 


(c) 


Notes and references 


The inequality of Theorem 23.4 is an early result of ruin theory known as Lundberg's inequal- 
ity. See Bowers et al. (1997, Example 13.4.3) for a case where the adjustment coefficient 
does not exist. An alternate method of handling the case of a mixture of gamma distributions 
is by developing an integrodifferenial equation for y(u) (see Klugman et al., 2012). There 
is an extensive literature on ruin theory which goes beyond the treatment here. See Grandel 
(1991) for some of the extensions. Several results concerning the distribution of the surplus 
just before ruin and just after ruin were developed in Gerber and Shiu (1998). Similar results 
along these lines are found in Powers (1995). 


Exercises 


23.1 Consider the discrete-time surplus process U, = u + G4 + Gz + =  G,, where the 
G;s are independent and each distributed as a random variable G which takes the 
values 1, 0, —1, —2 with probabilities 0.5, 0.2, 0.2, 0.1, respectively. Let y(u) be the 
probability of eventual ruin, starting with an initial surplus of u. 


(a) If you start with an initial surplus of 1, what is the probability that you will be 
ruined by time 2 or before? 


(b) Given y(6) = a, y (5) = b, w(4) = c, express y(7) in terms of a, b, c. 


23.2 You are repeatedly flipping coins and will win 1 for a head and lose 1 for a tail. You 
plan to stop playing when you get four consecutive wins, or after 100 tosses at the 
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23.4 


23.5 


23.6 


23.7 
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very latest. You start with 10 units. Let U, denote the amount of units you will have 
upon stopping. 

(a) Suppose that the probability of a head is 1/2. What is E(U;)? 


(b) Suppose again that the probability of head is 1/2, but now you are given the 
additional information that you began with three wins, followed by two losses. 
What is E(Uc) now? 


(c) Suppose that the probability of a head is 1/3. What is E(2Us)? 


An insurer's portfolio consists of a single possible claim. You are given the following 
information. The claim amount is uniformly distributed over (100, 500). The proba- 
bility that the claim occurs after time f is e™™! for t > 0. The claim time and amount 
are independent. The insurer's initial surplus is 20. Premium income is received 
continuously at the rate of 40 per year. Determine the probability of ruin. 


(a) Arandom variable G takes the value 1 with probability 6/7 and —1 with probability 
1/7. Show that the adjustment coefficient of G is log(6). 


(b) A random variable G takes the value 1 with probability 1/2 and 2 with probability 
1/2. Show that the adjustment coefficient is co. 


You are repeatedly playing a game in which at each stage you win 1 with probability 
15/19 or lose 2 with probability 4/19. 


(a) Show that the adjustment coefficient is log(3/2). 
(b) If U, denotes the amount you have at time n, show that U, is not a martingale. 
(c) At time 20 you have a total of 4 units. What is E((2/3)¥50)? 


(d) Suppose you start with 3 units at time 0. You decide to stop play when you have 
a total of 20 units, or after 100 plays if that occurs earlier. If Us is the amount that 
you will have after stopping play, what is E((2/3)"s)? 


You are playing a game repeatedly in which at each turn you either win 1 with 
probability 12/13 or lose 2 with probability 1/13. 


(a) Write an equation for the adjustment coefficient R, and verify that R — log(3). 


(b) Show that for an initial stake of u, the probability of eventual ruin is between 
(1/3)? and (1/3). 


Show that z in Example 23.5 approaches b/(a + b) as p approaches 1/2, verifying 
the conclusion of Example 23.4. 


Refer to Example 23.4. 
(a) Show that X, = (U, — ay —nis a martingale. 


(b) Use your result in part (a) to show that E[S] — ab. 
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This extends Exercise 23.8 to the case for which the probability of winning | is 


p # 1/2, as in Example 23.5. 


(a) Show that g(U,,) — n is a martingale, where 


Pr erg Fo od 
pr-1} Y p 


(b) Use your result in part (a) to show that 


b _a+b 1-(/9)? 
p-q P-@ l-(p/qy*? 
Redo Example 20.9, now assuming that q — 1/7 and that X has a constant value of 3. 


In the model of Section 23.5, let q equal 1/3, and let X take the values 1, 2, with equal 
probability. Show that y(k) = 1/4**! for all k. 
The remaining exercises deal with the continuous-time surplus process: 


U(t) = u + ct — (N(t), X), where {N(t)} is a Poisson process at rate 4. 


Answer the following parts separately: 


(a) Suppose that the initial surplus is 10. How large should the adjustment coefficient 
R be, to ensure that the probability of ruin will be less than 0.10? 


(b) Suppose that the adjustment coefficient R = 2/9, 4 = 5, and X is exponentially 
distributed with mean 3. Find c. 


(c) You are given that c = 20, A = 5. All you know about X is that E(X) = 3. What is 
the probability that the surplus will eventually drop below its initial value? 


(d) Suppose that X takes some fixed value with certainty. Show that if the initial 
surplus does drop below its initial value, the amount of deficit the first time this 
occurs will be uniformly distributed. 


You are given that X ~ Exp(3), c = 2, A = 3. You want the probability of eventual ruin 
to be no more than 0.05. How much initial surplus should you begin with? 


You are given that X is uniformly distributed on the interval [0,2], the initial surplus 
u = 10, and the adjustment coefficient R = log(3). 


(a) What is the probability that the surplus will eventually drop below 10? 


(b) Given that the surplus does eventually drop below 10, what is the probability that 
the first time this happens, the surplus will be between 8.5 and 9? 


Suppose that X is exponentially distributed. Is y(2u) equal to, strictly less than, or 
strictly greater than y (u)^? 
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For any random variable A, let R4 be the adjustment coefficient. That is, R4 is the 
positive solution of $4(r) = 0, where $4(r) = M,(—r) — 1. Suppose that G and H are 
random variables such that both adjustment coefficients exist and Rg < Ry. If K isa 
mixture of G and H, show that 


Rg € Rg < Ry 


by using properties of the graph of the function $4. 
(a) If the adjustment coefficient R = 2, find u so that y(u) < 0.05. 


(b) Suppose that X has a Gamma(2,3) distribution and that the adjustment coefficient 
R = 1. Find 9, the relative risk loading. 


(c) Suppose that X is exponential with mean 2. If y(0) = 0.8, what is y (1)? 


(d) Suppose that X is uniformly distributed on [0, 4] and that u = 5. Given that the 
surplus eventually drops below 5, what is the probability that, the first time this 
happens, the surplus is between 3 and 5? 


Use the m.g.f. of the maximal aggregate loss £ to show that E(£) = E(X2)/20E(X). 


For a surplus process with a compound Poisson claim process, you are given that 
the adjustment coefficient is 0.25, the claim amount has a density function f(x) — 
e7% + 2.5e7™ for x > 0. If £ is the maximal aggregate loss, determine P(£ = 0). 


If X is a mixture of Exp(a) and Exp(f) with weights w and 1 — w, respectively, find a 
formula for y(u) in each of the following cases: 


(a) a =3, 8 27, œ= 1/2, 0 2 2/5; 
(b) a=1, 822, œ= 1/3, 0 = 4/11; 
(c) a=3, p 26, œ= 1/9, 0 = 4/5. 


Suppose that A = 3,c = 3 and X has a density function f(x) = 36xe^ 5". Find a formula 
for y(u). 
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Credibility theory 


24.1 Introductory material 


24.1.4 The nature of credibility theory 


Suppose that X is the random variable denoting claims on an insurance policy during a specific 
period, and an insurer charges E(X) as a net premium for each individual. Of course, a typical 
group of purchasers is not homogenous and there will be both bad and good risks within the 
group, but these cannot normally be identified as such at the outset of the contract. This means 
that some are paying more than they should be and some less. Suppose, however, that after 
issuance of the policy, the insurer is presented with additional information on a policyholder, 
usually as the result of claim experience, which provides some indication as to the degree of 
risk. For example, a purchaser of automobile insurance incurs a few costly claims. Does this 
indicate they are a poor driver, and their subsequent premiums should increase, or could they 
really be good drivers with the costly claims simply a result of bad luck? Credibility theory 
deals with the problem of analyzing this additional information and deciding on how it should 
be used to modify future premiums. The key question then is, how ‘credible’ is the additional 
data, providing the source for the name of this concept. 


24.1.2 Information assessment 


We first explore the basic idea of utilizing information to reassess probabilities, beginning 
with a few simple examples. 


Example 24.1 A bag contains three dies. Two of these are standard, having sides marked 
with one to six dots. A third is special, in that the side with one dot has been replaced with six 
dots, so there are two sides with six dots and none with one dot. A game consists of drawing 


Fundamentals of Actuarial Mathematics, Third Edition. S. David Promislow. 
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a die at random from the bag and then tossing it. An amount of money equal to the resulting 
face is paid to the participant. 


(a) What is the fair price that a person should pay for one play of this game? 


(b) Suppose now that after selecting the die, you are allowed to throw it once, observe the 


result, and then decide if you want to play the game or not with this selected die. How 
does this additional information change the fair price you would pay to play? 


Solution. 


(a) If X, is the number that comes up after throwing a standard die, and X, the number 


(b 


— 


that comes up after throwing the special die, the resulting payoff X is a mixture of 
X, and X, with respective weights of 2/3 and 1/3. It is clear that E(X,) = 21/6 and 
E(X) = 26/6, so the fair price is 


E(X) = (2/3)(21/6) + (1/3)(26/6) = 34/9. 


The fair price will of course depend on the number we observe, for this will alter 
the respective probabilities that we assign to the two types of dice. If, for example, 
1 is thrown, then we know for certain that the selected die is standard, and the fair 
price will change to 21/6. Suppose that a 6 is thrown. We now cannot be certain 
which type of die was selected, but the observation clearly tends to provide evidence 
that it was the special one with two 6 dot sides rather than 1. To find the fair price 
we must reassess the respective probabilities of the selected die being either standard 
or special. The tool for doing so is a well-known result in probability and statistics 
known as Bayes theorem, which formally amounts to a simple calculation. Given two 
events A and B you can sometimes easily infer P(A|B), and you want to use that to 
compute the reverse conditional probability, namely P(B|A). (Refer to Section A.2 for 
notation.) This is easily done via 


Pay = PAOD PAD BPC) POPA 
- PA) — ORB) PRA PA 


Bayes Theorem therefore says the following. Suppose we observe that event A has 
occurred. Our new assessment of the probability of any event B is proportional to 
the number obtained by taking the original probability of B and weighting it with the 
probability that event B would have caused us to observe the event A. The constant 
of proportionality is P(A)~!. In the present example, the probability of a 6 on a 
standard die is 1/6 as opposed to a probability of 2/6 for the special die. The original 
probabilities of (2/3, 1/3) for the selection of a standard or special die respectively, 
will under the observation of a 6 being thrown, change to something proportional to 
((2/3)(1/6), (1/3)(2/6)) = (2/18,2/18). We could compute the probability of the 
observation to be 2/9, in order to derive the constant of proportionality as 9/2, but 
there is no need to do so. Since we know that the two probabilities add to 1, they must 
each be 1/2. Our fair price is now 


E(Y) = (1/2)(21/6) + (1/2)(26/6) = 47/12. 
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Finally, suppose a 2, 3, 4, or 5 is thrown. Since each of these has the same chance of 
being thrown by either type of die, the weightings are equal. The original probabilities 
and therefore the original fair price will remain unchanged. 

To summarize, the fair price is equal to g(y), where y is the number thrown and g 
is the function given by 


81) = 7/2, gQ) = g(3) = g(4) = 6) = 34/9,  g(6) = 47/12. 


Example 24.2 (The Monte Hall Problem) This example has nothing to do with insurance, 
but we include it since it is perhaps one of the most famous examples on the theme of using 
information to alter probabilities, and it indicates just how subtle the transfer of information can 
be. It keeps on reappearing and never fails to baffle several people (including mathematicians 
and others well versed in technical matters) who frequently get an incorrect answer. It is 
named after the original host of a popular T.V. game show, as it is typical of the type of 
decision making that the quests on this show sometimes had to make. A valuable prize is 
randomly hidden behind one of the three doors, and you will receive the prize if you can 
guess the correct one. Suppose that after you choose a door, say number 1, the host opens 
another door, say number 2, revealing that there is nothing behind it, and then allows you to 
alter your choice if you so desire. Should you remain with your original guess of door 1, or 
instead change your selection to the remaining door 3? 


Solution. This is the way the problem is usually formulated, but there is possibly some room 
for ambiguity (which may explain some of the wrong answers). It is a crucial condition that 
the host is obligated to open a blank door on every guess. The situation changes drastically if 
there is an option of doing so or not. For example, if the strategy of the host was to show you 
a blank door only when your guess was right, in an attempt to talk you out of a prize, then 
obviously you should never switch. 

A great number of people think that you should not switch in any event. They reason that 
since there is always one blank door among the two remaining ones, the revelation gives you 
no information at all and you may just as well stick with your original choice. This is incorrect 
however under our stated condition. We can apply Bayes Theorem as we did above. Let B; 
denote the event that the prize is behind door i, and A the observed event that the host opens 
door 2. We know that precisely one of the events B, or B4 occurred and we have to decide 
which is more likely. Now the original probabilities are 1/3 for each event. Note however that 
P(A|B3) = 1 since if the prize was behind door 3, the host had no choice but to open door 2, 
in order to give you a guess. Without knowing the strategy the host uses to pick between two 
doors that have nothing behind them, we can only say that P(A|B,) is some number p. The 
new probabilities given the event A are then proportional to p for A, and 1 for A}. Since p 
cannot be bigger than 1 it is clear that you should switch. 

As an example, in the case that the host picks randomly so that p — 1/2, the probability 
that the prize is behind door 3 would now be 2/3 as opposed to 1/3 for door 1. Suppose 
instead that the host always picks the lowest numbered door when there are two choices. There 
is no gain by switching in the given scenario, but if the host had opened door 3 in response to 
your selection of door 1, you would be certain to win if you switched. In all cases then you 
cannot lose and will normally gain by switching. 
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People go wrong on this since they fail to evaluate the extent of the information correctly. 
The host is telling you much more than just the fact that one of the two remaining doors is the 
losing one. They are indeed naming one particular such door, and this lends evidence to the 
fact that the remaining door has the prize. 

Another fallacious line of reasoning is that the information tells you only that either door 
1 or door 3 is the winning one, and therefore you have no reason to switch. This would be 
correct had the host told you that door 2 had nothing behind it, before you made your choice. 
But the fact that you are told this after the choice, alters the respective probabilities, as Bayes 
Theorem shows. 

As an aside, there is a simple way to see that you should switch without the use of Bayes 
theorem. The effect of switching is to convert an original wrong guess into a correct one and 
an original correct guess into a wrong one. But since you will be wrong on the average 2/3 
of the time, it is definitely an advantage to switch. 


We next present a situation, which is similar in nature to Example 24.1, but which ties in 
directly to our insurance application discussed in Section 24.1.1. 


Example 24.3 A population consists of 60% good drivers and 40% bad drivers. During any 
period, the claims for a good driver will be 10 with probability 0.2 and 0 with probability 0.8. 
The claims for a bad driver will be 10 with probability 0.5 and 0 with probability 0.5. 


(a) Calculate the net premium per period, assuming we cannot distinguish between good 
and bad drivers. 


(b) Suppose we observe that a particular driver had a claim of 10 in each of the first two 
periods. What is the expected claim amount for this driver in period 3? 


Solution. 


(a) The expected claims are 2 for a good driver and 5 for a bad driver. Since we have a 
mixture of good and bad drivers, the net premium is the overall expectation of 


2(0.6) + 5(0.4) = 3.2. 


(b) We argue exactly as in the Example 24.1. The probability that a good driver will 
produce the observation of a claim of 10 in each of the first two periods is 0.27 = 0.04. 
The probability that a bad driver will produce the observation is 0.5? = 0.25. Therefore, 
the original probabilities of (0.6, 0.4) for good and bad drivers respectively will 
be revised for this driver to something proportional to ((0.6)(0.04), (0.4)(0.25)) = 
(0.024, 0.1) and in order to add to 1, the two probabilities must be (6/31, 25/31). Our 
revised expectation of third period claims is 


2(6/31) + 5(25/31) = 137/31 = 4.42. 
Let us summarize the reasoning of a typical insurer in this situation, which will be familiar 


to many readers who have had insurance premiums raised due to their claims experience. In 
this instance, a possible conclusion for the insurer to make is that they are dealing with a bad 
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driver, and raise this individual’s premium for the next period to 5, if this is allowed by the 
contract. This could be unfair however, since there is some possibility that the person is a 
good driver with the two high claims occurring only by chance. Nonetheless there is enough 
evidence to raise the premiums in the next period from 3.2 to 4.42. 

To complete the analysis as in the dice example, we would want to know the expected 
claims for the other three possibilities for claims of either 10 or O in the first two periods. We 
leave this as an exercise for the reader. 


24.2 Conditional expectation and variance with respect to 
another random variable 


24.2.1 The random variable E(X|Y) 
We refer back to Section 20.13.2. 


Definition 24.1 Given discrete variables X and Y, defined on a sample space, we define the 
random variable E(X|Y) to be Ej(X) where the set is partitioned according to the values of Y. 
In other words, for a point œ such that Y(@) = y, 


E(X|Y)(@) = E(X|Y = y). 


The reader is invited to show that E(X|/p), where Ig is the indicator random variable as 
defined in Section A.5, takes the value E(X|B) (as defined in Section A.7) on the set B and 0 
on the complement of B. 


Example 24.4 Suppose that © consists of four points €, 0», @3, @4 with probabilities 0.4, 
0.3, 0.2, 0.1, respectively, and that 


Y(@,) = Y(@,) = 1, Y(@3) = Y(o4) = 2, X(@;) =i, i= 1,2,3,4. 


Describe the random variable E(X| Y). 


Solution. The set (Y = 1} consists of the two points œ, and œ, with conditional probabilities 
equal to 4/7 and 3/7, respectively. Therefore E(X|Y) takes the value of 4/7 + 2(3/7) = 10/7 
on œ; and c». Similarly it takes the value of 3(2/3) + 4(1/3) = 10/3 on œ; and c. 


For another example, which will indicate how this concept ties in with credibility theory, 
we leave the reader to verify that in Example (24.1) where Y is the observed number on the 
first throw and X is the final outcome, the random variable E(X|Y) is equal to g(Y) where g is 
the function constructed in that example. 

We motivate the use of E(X|Y) by looking at a problem of prediction. Suppose we want to 
predict the outcome of a random experiment which is modeled by the random variable X. Our 
method of doing so must obviously depend on the objective that we have in mind. For a simple 
example, suppose that X takes the value O with probability 0.4 and 100 with probability 0.6. 
If our objective is to maximize the probability of an exact answer, we would clearly predict 
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100, but of course that means that we will be far off 40% of the time. Suppose our objective is 
to be ‘close’ on the average. This criteria demands that we clarify what we mean by ‘close’. 
Assume we select the time honoured way of measuring closeness in statistics, which is via 
squared deviation. In this context we should choose as a prediction that number b such that 
E[(X — b)*] is minimal. Let p denote E(X) and consider the identity 


ERX — b] = EIX — uy] + (b — uy, Q4.1) 


which is easily derived by expanding the right-hand side. From this, it is evident that we 
choose b = y, as we might well have guessed without any calculation. 

Now suppose that before making our prediction we can observe the outcome of another 
random variable Y and then base our prediction on the value of Y. Our prediction Z then 
will be a random variable, since it depends on the random value of Y. We will show that the 
least squares minimizing choice for Z is the random variable E(X|Y). That is, ELX|Y] can be 
viewed as that random variable which is a function of Y, and which gives, in some sense, the 
best estimate of a value of X after being given a value of Y. 

In the rest of this section, when Y is fixed, we will write E[X|Y] as X for convenience in 
notation. In addition to the general results given in Theorem 20.6 we need some additional 
facts dealing with the relationship between the two random variables. 


Theorem 24.1 
(a) If X is a function of Y, then X = X. 
(b) If X and Y are independent, then X takes the constant value of E(X). 


(c) X is a function of Y that minimizes the expected squared deviation from X. 


Proof. 


(a) If X is a function of Y, then its value is constant on the sets Y — y. Part (a) follows 
immediately from Theorem 20.6(b), taking V to the random variable with constant 
value 1. 


(b) We make use of probability functions. On any set (Y = y}, X will have the constant 


value 
Sx,y@y) SOWO) 
X———— = X————— = xfy(x) = E(X). 
2, fy) 2, fyO) 2, x 
(c) We will derive the following which will imply the desired result. If Z is a function of 
Y, then 


E(X - ZÝ = E(X - £y. + E — ZP. 
Using linearity, the right-hand side expands to 


E(X?) — 2E(XX) + ERP) + E(&?) - 2E(ZX) + E(Z)’. 
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From Equation (20.25) 
E(XX) = E(X), E(ZX) = E(ZX), 


and substitution into the expression above gives the left -hand side. 


Remark The results above remain true in much more generality than for discrete random 
variables and will be so applied in the material that follows. The main problem is that when 
Y is continuous, a set B of the form Y = y will have probability zero, and we cannot directly 
define the conditional probability P(-|B). In the case where a joint density function fy y exists, 
this problem can be handled, since we can define the conditional density function on such a set 
Bas fy y(x, y)/fy() and things will work out as above. A full treatment, however, applicable 
to general random variables requires knowledge of some advanced topics, such as measure 
theory, and we will not do this here. 


We conclude this section with a generalization. In what follows we will deal with cases 
where instead of making an observation of just one random variable before making our 
prediction of X we may observe n random variables. In fact we have already encountered this 
situation in Example 24.3 where we had two observations. This means that our observation is 
not just a random variable but a random vector: 


Y= (Yi, Yo,... Y, ), 


ECR 


where each Y; is a random variable. We can define the random variable E(X|Y) as En (X) where 
II is just the partition of the sample space into the sets on which the vector Y is constant. As 
in the case where n = 1, E(X|Y) will be that random variable which is equal to g(Y) for some 
function g of n variables and which minimizes E[(X — Z)*] over all random variables Z which 
are functions of Y. 


24.2.2 Conditional variance 
We now introduce another important random variable. 
Definition 24.2 We define the conditional variance of X given Y, denoted by Var(X|Y), as 
the random variable that takes the value Var(X|Y — y) on a point c of the sample space such 
that Y(@) = y. 

An equivalent formulation is to write 

Var(X|Y) = E[(X — £Y]. (24.2) 
A major use of this concept is in the following decomposition of variance formula. 


Theorem 24.2 


Var(X) = E[Var(X|Y)] + Var[E(X|Y)] (24.3) 
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Proof. Let 
w = Var (X), v=E[Var(X|Y)], w= E(X) = E(X.) 
Use linearity to expand the right-hand side of (24.2) to get 
Var (X|Y) = E(X?|Y) — 2E(XX|Y) + E(X?|Y). 


Now X2 being constant on the sets (Y = y] is clearly equal to E [X^ y ] and Theorem 20.6(b) 
shows that £? is equal to E[XX|Y]. Making these substitutions and taking expectations gives 


v = E(X?) — E(&)). 
We also have 
w = E(X*) — y. 
Adding the last two equations gives 


Var (X) =wt+yv 


to complete the proof. 


Remark We have encountered particular cases of this decomposition previously. The first 
was for the variance of deferred annuity values given in formula (15.16). In that case we used 
Y in place of X and the conditioning random variable was the indicator random variable for 
survival to time m. The second was the frequency-severity decomposition given in formula 
(21.3). In that case we had S in place of X and N in place of Y. 


Example 24.5 Verify the decomposition of variance formula for Example 24.3 with Y as 
the random variable that takes the value of 1 for a good driver or 2 for a bad driver. 


Solution. In this example X takes the value 10 with probability (0.2)(0.6) + (0.5)(0.4) = 0.32, 
and the value 0 with probability 0.68. Therefore 
Var(X) = 100(0.32)(0.68) = 21.76. 


(A variance of a two-valued random variable is simply the square of the difference, multiplied 
by the product of the two probabilities). 
E(X|Y) takes the value 2 when Y = 1 or 5 when Y = 2. Therefore 


Var(X|Y) = 32(0.4)(0.6) = 2.16. 
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Now (X|Y = 1) takes the value 10 and 0 with probabilities 0.2 and 0.8, respectively, so 
Var(X|Y = 1) = 100(0.2)(0.8) = 16, and similarly Var(X|Y = 2) = 100(0.5)(0.5) = 25. So 


E[Var(X|Y) = 16(0.6) + 25(0.4) = 19.6. 


We return again to the theme of transfer of information. The quantities w and v are 
useful tools in measuring the information about X that we get from observing Y. Before the 
observation, we start out with some uncertainty as measured by Var (X). We may view w as 
representing that part of the uncertainty that is removed by the information we get from being 
told the value of Y, leaving the uncertainly of v. A high value of w tends to increase the value 
of the information. On the other hand when v is high, it means that on average Var(X|Y = y) 
is high. This suggests that even knowing the value of Y, there is a great deal of uncertainty in 
the value of X. A high value of v then tends to diminish the value of the information. 

Both w and v depend on the units used in measuring X and Y. We can make use of a 
similar idea as we used in Definition 15.3 to get a quantity independent of units by defining 


which we will refer to as the information coefficient of Y with respect to X. It will play a major 
role in the sequel. The lower the value of k, the more useful the information is to us. Consider 
the extreme cases. 


(i) When X and Y are independent, then X has the constant value of Jj, so w = 0. We get 
absolutely no information from knowing Y as reflected by the value of k = oo. 


(ii) X is a function of Y. Now X = X so w = Var (X) and v = 0. The information removes 
all our uncertainty. This is reflected by the value of k = 0. 


24.3 General framework for Bayesian credibility 


Now, having given some basic examples and theory, we will go back and look in detail at the 
original insurance situation. We start with a random variable X that represents total claims 
in a period from an insured individual. We make the assumption that individuals can be 
distinguished by a risk parameter O which varies over some subset of the real numbers, but 
the value of O is unknown for each particular individual, O then constitutes another random 
variable. We assume however that we have the following information: 
: fx G0) m. : ; 
F(x; 0) = NE the probability or density function of (X|®), 
JO 
z (0), a probability or density function for ©. 


So for example where X is discrete, f(x; 0) would give the probability that for an individual 
with risk parameter 0, the value of X will be x. 
The probability or density function of X then can be written as 


fx) = DFO: 0)n(0) 
0 
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when @ is discrete or 
fœ) = » fx; 0)n(0)d0 


when @ is continuous. 
Let us show how this notation fits into Example 24.3. The two values of O were 1 
and 2, denoting good and bad drivers, respectively, and the distribution of O was given by 


2(1)20.6, 2(2)=04. 
We then needed two density functions which were 
f(0:1) 20.8, f(10; 1) = 0.2, 
and 
f(0:2)2 0.5, f(10;2) = 0.5. 


Suppose we observe the claims of a certain individual over n periods. We then have a 


random vector X = (X,, X», ... , X,) where X; denotes the claims in the ith period. Our assump- 
tions are that each Xj is distributed as X and (X, X5, ..., X,) are independent, conditional 
on ©. 


Remark Itis important to note that the random variables X; themselves will not in general 
be independent. Referring again to Example 24.3, if some X; has a value of 10, this is evidence 
that we have a bad driver, and this will affect the likelihood that the other values Xj will be 
high. In general (when say X is discrete), the probability that X, = x4 and X» = x», given that 
© = 0, is equal to f(x, ; 0)f (x; 0) but the probability that X, = 1 and X, = 2 is not necessarily 
equal to fy (x1 )fy G5). 


Our overall goal is to estimate the claims in the next period. That is, we want to determine 
the number E(X, , ,|X = x) which, as we have seen, will give us the least squares minimizing 
prediction. We usually will write this as E(X, , ,|X), with X being understood. This number is 
known as the Bayesian credibility premium. We will not calculate it directly in the form given, 
however, but rather use the procedure that we have illustrated in Examples 24.1 and 24.3, 
where we make use of the random variable O as an intermediary, which usually simplifies the 
computation. Formally we are calculating E[E(X |9)|x]. 

To summarize, the procedure is as follows. 


Step 1: Calculate a revised distribution of O. (This is known as a posterior distribution 
as opposed to the original distribution, which is known as the prior.) Following 
Bayes Theorem, we do this by multiplying original probabilities or densities by 
the probability or density that a parameter of 0 would cause the observed sample. 
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For the latter, we invoke the assumption of independence of the operations given 
the value of © to write the new probability or density function as 


z*(0) x z0) | [ £0; 8). 


i=1 


(The symbol œ stand for ‘proportional to’ and means that there is some 
constant (i.e. a value independent of the variable(s)) multiplying the right-hand 
side. Since this constant is determined uniquely by the fact that the function on 
the left side sums or integrates to 1, we normally do not need to write it down 
exactly.) 


Step 2: Identify the random variable X* = E*(X|©), where * indicates that we are using 
the posterior distribution found in step 1. 


Step 3: Calculate the credibility premium as E(X*). 


24.4 Classical examples 
To illustrate the procedure above we will give two classical examples. 


Example 24.6 (Normal—Normal) Suppose that for each 0, X is normal with mean 0 and 
variance v, while © is itself normal with mean y and variance w. Find the credibility premium. 


Solution. Consider first the case when v = 1. We then have 
1 
f(50) c e 39-9. 


and 


1 
z(0)«e w 


0-0? 
To simplify the calculation, we will use the following fact. Given any n numbers 
X1, X3, ... , X,, let x = (1/n) bud X;. Then for any number c 
n n 
G; - c = Y; - 3! e n - oy. (24.4) 
=I i=l 


l 


This follows by applying (A.9) with X as a random variable that takes the value x; — c for 
i= 1,2, ... n, each with probability 1/n. Rearrange and multiply by n. 


1 n 1 
Step 1: 2" (8) o e73 LIPO Tm 0-9 oy e72, (24.5) 
where 


A= X546; - 07 -w (0 — uy 2 YG, 3? 9 nx -0 +w lO — uy 
= Y x; — 2)? n +w u? 8 nw)-20(nx- uw !). 
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We use (24.4) for the second equality above. Let 
we =(ntw!y! 


and 


D Nem. wi! n z 
H^ = (nž + uw )w* = E Hc Ew X. (24.6) 


In the above expression for A, simplify by completing the square for the terms in 
©, then lump together everything else as a constant, to obtain 


(6 - uP 
ow 


A +K, 


where K is independent of 0. From (24.5), 


N NC WA 
z'(0Q)xe 2 


so we have succeeded in identifying the posterior distribution of O as again being 
normal but now with mean y* and variance w*. 


Step 2: Since © is the mean of X, this means that X* = E*(X|©) = © equipped with the 
posterior distribution. 


Step 3: The credibility premium is E*(0) = p*. 


Consider now the general case where X has variance v instead of 1. Then the random 
variable X = X/ vv has variance 1. It will have mean O/ vv which is normally distributed 


with mean p/ vv and variance w/v. From (24.6) 


ey - 

vw u n x 

E(X41|X) = + : 
Knl (==) v» (A) M 


To verify the last term, keep in mind that we must divide our observed values by vv to get 
values of X rather than X. We then multiply by vv to get 


E 
vw n " 
E(X X = x) = | —— +| —— |x. 
Oc ) (==) (=) 


Let 


This gives the convenient form 


EX lx) 2 (0 — Zu + Zi. (24.7) 
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Note also that ELX|©] = © so that Var E[X|©] = w. Moreover Var (X|@) is a constant v, 
so that E[ Var (X|©)] = v. The constant k then is the information coefficient of © relative to X 
as we defined in Section 24.2.2. 

In this case we obtain a very satisfactory prediction of the next value of X as a weighted 
average of the over-all mean 4 and the mean of our observed values. The quantity Z is 
known as the credibility factor. The higher the value of Z, the more weight is given to the 
Observations, as opposed to our prior estimate. In this case, it behaves just as we would 
expect. As k decreases, which means that more information is obtained from an observation, 
Z increases. Of course Z also increases as n increases. This is natural enough, since with a 
large number of observations, we can expect the data to be a reliable source of information. 


Example 24.7 (Poisson-Gamma) Suppose that 


Oe? 


x! 


f6050) = 


s ESSE 2, dss 
a Poisson distribution with mean 0, and 
1(0) x 0971 P’, 


a Gamma distribution with parameters a and p. Find the credibility premium. 


Solution. 


Step 1: z(0|x) e f(x f G5) ...f(x,)x(8) x 09109 ... OEM OA 1e“PO x gatnt- 
e-(*)9. The posterior distribution of © is again a Gamma distribution but now 
with parameters a* = a + nx and f* = fi +n. 


Step 2: X* = E(X|8) = © so X* has a Gamma distribution with parameters a* and f*. 


Step 3: From (A.55), 


.a* ( P Ya n yx 
Exe E m (25) se (gs 


which is exactly the form of (24.7) with credibility factor Z = n/(n + f). 


Note also that 


w = Var E(X|®) = yr 


and since the variance of a Poisson is the same as its mean 


v = E[Var (X|®)] = E(0) = A 


It follows that the information coefficient of © with respect to X is v/w = fi, so the credibility 
factor is the same form as found in Example 24.6. 
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24.5 Approximations 


In many cases, we cannot apply the exact methods shown above since we may only have 
partial information about the distributions. Even in cases where we do know the distributions 
exactly, the computation may be too complex to be feasibly carried out. We seek approximate 
methods. 

Considering Examples 24.6 and 24.7, we might ask if we can approximate the revised 
expected value of X,,,, as a linear combination of the observations and the original expected 
value. We look therefore for constants a, a1, ... ,@,, such that we can approximate E(X,,, , |X) 
by the random variable 


n 
X, = aq + DY aX; (24.8) 
i-l 


24.5.1 A general case 


In view of our least squares minimization result of the section above, it is natural to choose 
the coefficients a; so as minimize the squared error between X,,,, and X,,,,. In other words 
rather than choosing our minimizing function g from all possible functions of n variables, we 
confine ourselves to functions of the form 


n 
g(X1,X5, ... Xp) = Ay + > aX; 
i=l 


In cases such as Examples 24.6 or 24.7 above when the actual credibility premium is already 
of this form, our approximation will necessarily be exact. 
As a simplification we demonstrate the procedure with n = 2. We want to minimize 


Q = E[X4 — a — a4 X A aX]. 


By the convexity of the square function, we can minimize by setting the partial derivatives of 
Q equal to 0. For the derivative of Q with respect to ag this gives 


E(X3) = æo + au E(X4) + an E(X). (24.9) 
Taking the derivative of Q with respect to a, and setting this equal to 0 gives 
E(X3X1) = agE(X,) + a, E(X1 X1) + a5 E(X5X|). (24.10) 
Multiply (24.9) by E(X,) and subtract from (24.10) to get 
Cov(X3, X1) = a, Cov(X,, X) + a5Cov(X», X4). 
Similarly, setting the derivative with respect to a; equal to 0 leads 


Cov(X5, X5) = a, Cov(X,, X5) + a,Cov(X,, X3). 
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For general n, the procedure is similar and we get a system of equations 
Cov(X;, X,,1) = a; Var (X;) + >, a;Cov(X;, X;), (24.11) 
ao = (1— S), ” (24.12) 


where u = E(X) and s = Xai a;. If we know the covariances, these equations are easily 


solved to yield the desired coefficients. 


24.5.2 The Bühlman model 


The formulas in the last section apply to a quite general situation, but it could be difficult to 
actually find the covariances. Using our framework, with the intermediate random variable of 
©, we can determine these from our basic three parameters: 
H-E(X)-E(X|O)], v-E[Var(X|O), w = Var [E(X|0)]. 
Terminology vis sometimes called the variance of the hypothetical means while w is called 
the the expected process variance. 
Applying Equation (20.24) 
E(XXj) = ELEX;X;10) = ELEGGIO)EQGIO)] = ELEGXIO?], 

where we use the fact that the X;'s are independent conditional on O. This shows that 

Cov(X;, X;) = E[X;X;] - x? = E[(EX|0)"] - x’ = w. (24.13) 
From Theorem 24.2, v + w = Var(X;) for all j and we substitute into (24.11) to get 


w = ajv + sw. (24.14) 


This verifies that the a;'s are all equal (a fact that seems clear from the outset since all X; have 


the same distribution). Summing the above equations for j = 1,2,...,n we get nw = sv 5 sna 
so that 
n 
=) 24.15 
"Um +k ( ) 


where k = v/w. Now, using (24.12) we can write Equation (24.8) as 
AX ud -(1- s)u + sX, (24.16) 


which is precisely of the form we noticed in the examples of the last section, with s as the 
credibility factor. This estimate is known as the Bühlman credibility premium, named after 
Hans Bühlman. The general form of the credibility factor (24.15) had been suggested from 
the early days of the subject, but the choice and determination of the constant k remained 
somewhat mysterious. Bülhman's contribution was to identify this constant under least squares 
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minimization. The Biihlman credibility premium will be equal to the Bayesian credibility 
theorem, if and only if the latter is in fact a linear combination of the prior mean and the 
observed sample mean, as observed in Examples 24.6 and 24.7. 


24.5.3 Bühlman-Straub Model 


The assumption that the X;'s are identically distributed does not always hold in applications. 
Credibility theory is often applied to group insurance where an observation may be the 
average of several individual claims, and the number of such individuals may vary by period. 
In addition, there may be differences in the length of the observation period at different times. 
In many such cases we can handle the situation by a modification of the Bühlman model, 
which involves introducing a weight p; for period j such that 


Var (X10 = 0) = 9()/p;. 


for some function v. We now let v = E[Var (X|®)] = E[v(80)]. 

This will result in a generalization of our least squares estimate, known as the Buhlman— 
Straub credibility premium. The original model covered the case where each p; — 1. 

The decomposition of variance formula now becomes Var(X;) = w + v/ p; so that when 
we substitute into Equation (24.11), we must modify Equation (24.14) to 


pjw = av + pjsw, 
for each j. Letting p = NS p; we sum this over all j to get 
pw = sv + psw = s(v + pw). 
So that 
pw P 


s= = —. 
pw+v p+k 


Then (1 — s) = v/(pw + v) and from each of the starting equations we can write 


L- pil-s)w — piw 
í v pwtv 


So we obtain the same formula as before only now with the credibility factor applied to a 
weighted average of the observations namely, for 


n 
. 1 
ee » Dj*j 
Siren 
the Bühlman-Straub credibility premium is 


(1 — s)u + sx. 
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Example 24.8 Let Y denote the aggregate claims per period for an individual, and suppose 
that E(Y) = 100, E[ Var (Y|0] = 125, Var [E(Y|©)] = 5. The last three years of data on a group 
insurance plan show that in the first year 40 employees had aggregate claims of 4000, in the 
second year 25 employees had aggregate claims of 2000 and in the third year 35 employees 
had aggregate claims of 2500. In the fourth year the group will consist of 45 employees. 
Estimate the aggregate claims for the fourth year. 


Solution. Let X; denote the average claim in period i. The Bühlman-Straub model applies 
with weights p, = 40, p, = 25, p} = 35. ( To verify this, note, for example, that (X; |©) is the 
sum of 40 independent copies of (Y|©) divided by 40. So Var(X; |©) = 40 Var(Y |©) /40?.) 
The credibility factor is given by 


100 


Zunge ces 
100 4- 125/5 


and 


4000 + 2000 + 2500 — 
100 


X= 85 
so that the estimate of the average claims in the fourth year is 


0.8(85) + 0.2(100) = 88 


and our estimated aggregate claims for year 4 is 45(88) = 3960. 


24.6 Conditions for exactness 


Looking at Examples 24.6 and 24.7, one might wonder whether the Bayesian credibility 
premium will always be a linear combination of the prior mean and the observed sample 
mean, so that the Bühlman credibility estimate is exact. The answer is no. 


Example 24.9 Show that the Bühlman credibility estimate in Example 24.3 is not exact. 


Solution. In this case u = 3.2 and from the solution to Example 24.5, 


2 


Zum = 0 78060, 
2+ 19.6/2.16 


and so the Bühlman credibility premium for an observation of (10,10) will be 
3.2(1 — 0.1806) + 10(0.1806) = 4.43. 
The approximation is close to but not exactly equal to the true answer of 137/31. 


We can however identify some sufficient conditions for exactness. 
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Theorem 24.3 Suppose that the following conditions hold. 


(i) There exist functions b and c, with c continuously differentiable, such that the density 
or probability functions of (X|0) are given by 


b(x)e~™* 
c(0) 


f(x; 0) = ; (24.17) 


for all x in (m,n) where (m,n) is an interval (possibly infinite ) that does not depend 
on 0. 


(ii) There exists parameters j, k, such that the density or probability function of © is given 
by 


z(0) x c(8)*e7?.p «0 «q (24.18) 
for the same function c as in (i). 
(iii) limo, z (0) and limg.,, z (0) exist and are equal. 


Then the Bühlman credibility premium is exact. 


Proof. We give the proof in the case that f is a density function. The discrete case can be 
verified in the same manner with summation replacing the integration. The argument will be 
broken up into various steps. 


Step 1: We note first that conditions (i) and (ii) imply the property observed in the Section 


24.4 examples, namely that z* will have the same form as z but with changed 
parameters. Indeed, 


x" (8) e f Gf Qo) -f Ena) « c(8) "t 9e 0e), 
which is the same form as z but with parameters 


ke =n+k, jf =nx+j. 


Step 2: We show that 


_c@) 
«0 


E(X|0 = 0) = (24.19) 


For any 0 in (p, q) we know that f(x; 0) integrates to 1 over the interval (m, n) so 
that 


c(8) — / i e"? bx)dx. 
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Differentiating with respect to 0 
n 
(0) = / —xe ?*b(x)dx = —E(X|® = 0)c(0), 
m 
establishing (24.19). 
Step 3: We show that 
E(X) = j/k. (24.20) 
Taking logarithms in (24.18) gives 
log z(0) = log(K) — klog c(@) — j0, 


where K is the constant of proportionality. Differentiating with respect to 0 and 
substituting from (24.19), 


x’) — (c0) | 
z(0) c0) 


j = kE(X|9) - j. 


Now multiplying by z(0), integrating, substituting from (24.19) and using Equa- 
tion 20.24. 


/ O =k J : E(X|0)2(0)d0 — j J * n(8)d6 = kE(X|0) — j = kE(X) - j. 
p p p 


The left-hand side is lime, z(0) — lim,.,,, + 2(0), which equals 0 by condition 
(c) and establishes (24.20) 


Step 4: In conclusion, we apply Step 3 to the new distribution of X and note that the 
credibility premium is 


I nx +] n » k 
EX” =L = =( ) (——) Bx 
o k* k+n k+n "t k+n e 


Since the credibility premium is a linear combination of x and E(X), we know from our 
previous remarks that the Bühlman estimate is exact. 


Remark The final step shows also that the information coefficient of O with respect to X 
is given by the parameter k in the density of z(0). This can be verified by direct calculation. 


Remark Distributions satisfying (24.17) are said to be from a linear exponential family. The 
name reflects the fact that X and 0 are related only through an exponential term. Distributions 
of the form (24.18) are what are called conjugate priors for f(x; 0). This means that, as we 
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noted in Step 1, the altered distribution has the same form as the original, but with different 
parameters. 

Condition (iii) is automatically satisfied when the interval (p,q) is the whole real line, 
since the fact that z integrates to the finite number 1, means that the limits in questions must 
be both zero. In another common case where the interval is (p, oo), we need the limit at p to 
be zero, which will hold if and only if limg_,,, c(@) = 0. 


The form of the density in (1) of Theorem 24.3 is not as restrictive as might appear, and 
many cases can be put into this form with a change of parameter. Several of these are covered 
by the following generalization. 


Theorem 24.4 Suppose that (iii) of Theorem 24.7.3 holds while (i) and (ii) are modified 
to 


(i) There exist functions b, c, a, where c and a are continuously differentiable, such that 
the density or probability functions of (X|®) is given by 


a(0)x 
f(x; 0) = LE (24.21) 


for all x in an interval (m, n) that does depend on 0, 


(ii) There exist parameters j, k, such that the density or probability function of © is given 
by 


z(0) « c(0) e". 5 « 0 cq (24.22) 


for the same functions a and c as in (i). 


Then the Bühlman premium is exact. 


Proof. Follow the proof of the previous theorem, but simply note that in Step 2 we now get 
c'(8) = E(X|0)c(0)a' (0). 


The a'(0) is cancelled out in Step 3 when we multiply by z(6), and the proof is completed as 
above. 


Note that this generalizes Theorem 24.3, which is the particular case with a(0) = —0. 

We now show that the two examples given in Section 24.4 are covered by the above theorem 
and therefore we could have written down the Bayesian credibility premium immediately 
without going through the calculations. 

In Example 24.6, with v = 1, we can expand the square and write f(x, 0) in the form of 
(24.17) with c(0) = ef /2, By again expanding the square in the expression for z(@), we see 
that it is of the required form (24.18) for the same c(0) with k = 1/w and j = pk. 

In Example 24.7 we can apply Theorem 24.4 with a(0) —log(0),c(0) = ee k= B. 
j=a. 
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24.7 Estimation 


In many cases we do not have any information regarding the distribution of X given © or the 
distribution of ©, and we must rely strictly on the observed data alone. The data now performs 
double duty, giving us not only the credibility factor, but also an estimate of the parameters 
of the distributions. We begin with the Bühlman model and provide one possible method for 
estimating p, v and w. 


24.7.1 Unbiased estimators 


This is the only time in the book where we delve into the field of statistical inference. Many 
readers will be quite familiar with the material in this section, but for completeness we give a 
quick discussion of the concept of an unbiased estimator. 

Suppose we want to estimate a parameter v of a random variable X on the basis of 
observations X,, X», ... , X,. To do so we choose a function Ŷ of n-variables and if the observed 
values are (x,, x», ... ,x,), then we estimate the value of v as (x1, X2, ... ,X„). SoP is a random 
variable, since it depends on the particular observations. It is known as an estimator of v. 

For an easy example, suppose we want to estimate the mean yp of a distribution. A natural 
way of doing this is to take the sample mean as an estimate. That is 


Xp XX, 


n 


f=X= (24.23) 


So, for example, if n = 3 and we observe the values 10, 5,9, we would estimate the mean 
to be 8. 
One desirable quantity of an estimator is that the values it gives will average to the true 
value. That is, we would like that 
E($) =v. 
Such an estimator is said to be unbiased. For example, fi as given above is unbiased since 


EM) = Y /n)EQG) + EX) +... + EQG)] = (1/mnEQO = p. 


Another quantity which we often want to estimate is the variance. Obtaining an unbiased 
estimator is not quite as obvious in this case. A possible guess might be the estimator 


n 
1 = 
- Vix, - Xy, 
dax 


but this turns out to be biased. We will derive a natural unbiased estimator in the case that the 
observations are independent. 
First note that 


Var (X) = E nVar (X) = Var (X). (24.24) 


470 CREDIBILITY THEORY 


This is a natural result. When we average over independent observations, the high and low 
values will tend to cancel each another. We can expect to get a result closer to the mean than 
with just one observation, so that the variance reduces. 

Now consider (24.4) with c = p, applied to random variables X; instead of numbers, and 
take expectations. For the term on the left of (24.4) we get n Var (X) and for the term on the 
far right we get nVar (X) — Var (X) by (24.24). Rearranging, 


E [Èa = wi = nVar (X) — W(X) = (n — I) Var (X). 


i=1 
This shows that an unbiased estimator of Var (X) is given by 


n 


P : i Ye cx) (24.25) 
i-l 


Note what happens for the case that n = 1. Clearly one observation gives us absolutely no 
information at all regarding the variance, and that is reflected in the fact that our estimator is 
not defined. 


24.7.2 Calculating Var(X) in the credibility model 


Section 24.7.1 is perfectly general. We now want to consider the particular case of the 
variance of the sample mean in the basic setup of credibility theory. A major problem is that 
the observations are not independent. so that (24.24) does not hold. 

Consider, however, our basic decomposition from Theorem 24.2, 


Var (X) = Var [E(X|@)] + E[Var (X|®)]. 


Substituting for X and using the fact that the observations are independent, conditional on 0, 
we see that the first term will be equal to Var [E(X|©)] = w, while the second term will be 
E[X|©]/n = v/n. So we can write 


Var (X) = w + v/n. (24.26) 


This result can be explained intuitively. Consider that part of the uncertainty that is due 
to the lack of homogeneity, that is, the variation due to differences in the risk parameter. That 
does not get reduced by taking several observations from a single policyholder, and averaging. 
Only the part that is due to the variation for a given value of O is divided by n and therefore 
reduced. 


24.7.3 Estimation of the Bülhman parameters 


In our previous models we used the observations only to obtain X. In this case where we as well 
want to estimate parameters from the data, we must make some more refined observations. In 
this instance, it will not be sufficient to look at claims from a single individual only. We need 
to have data from r individuals, where r > 2. We suppose, therefore, that we are going to take 
n observations from each of r individuals. We distinguish between observations from different 
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individuals, which are independent, and observations from the same individual, which are only 
independent conditional on ©. Let X;; be the value of the jth observation from individual i. We 
now show how to derive unbiased estimators of each of the three parameters from this data. 


Estimation of X. In this case there is no need to consider the independence structure 
and we can just take the mean of the entire sample as an estimate as we did in the case of 
known distributions. That is we take 


"EON. 
B-X-- lx; (24.27) 


This can also be calculated as 


where X; — (1/n) 2534 X; ;, the mean of the observations for individual i. 


J 


Estimation of v. Let 0; be the (unknown) risk parameter of individual i. By the inde- 
pendence conditional on 0 we can take 


as an unbiased estimator of Var(X|O = 0;). Therefore, an unbiased estimator of v = 
E[Var (X|0)] is given by averaging the above over all r individuals. The estimator is given by 


^ 
TEDNA 
r j=l 
Estimation of w. From (24.26) 
w = Var (X) — v/n. 


Now X;, X, ... , X, constitute r independent observations of X, so from (24.25) 


yd ky? 


r—-1 


is an unbiased estimator of Var (X). Then, from the estimate of v, invoking (A.8) and (A.22) 
we can take 


as an unbiased estimator of w. 
One problem with this approach is that the estimate Ŵ often turns out be negative, while we 
know w is the expectation of a nonnegative random variable. Note that this does not contradict 
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unbiasedness. It simply means that in this particular case the estimate of w is too low and 
indicates that there will be other sets of observations that will give an estimate of w which 
is too high. It points out the fact that the property of being unbiased is not always by itself 
sufficient to ensure a reasonable estimator. The method of handling this problem is normally 
to set w = 0. This means that Z = 0, and the credibility premium is simply the mean of the 
data. The conclusion is in effect that the data does not show sufficient variability between 
individuals to indicate any departure in our estimates from the observed overall mean. 


Example 24.10 Policyholder 1 has aggregate claims in the first three periods of (4,2,6). 
Policyholder 2 has aggregate claims in the first three period of (6,5,10). Estimate the credibility 
premium for each policyholder. 


Solution. We have that 


v= (04+44+4/2=4, $,2(1-449)/227, F=11/2 
fv = (1.5? + 1.2) — 5.5/3 = 8/3 


Our estimate of the information coefficient is then (11/2)/ (8/3) = (33/16), so the 
credibility factor is 3/( 3+ (33/16)) = 0.593. 

The estimated credibility premium for policyholder 1 is 0.593(4) + 0.407(5.5) = 4.61. 

The estimated credibility premium for policyholder 2 is 0.593(7) + 0.408(5.5) = 6.39. 


24.7.4 Estimation in the Bulhman-Straub model 


The procedure described in this section can be modified to handle estimation in the Bülhman- 
Straub model. The results are similar, but one must be careful with the handling of the weights. 
We will list the final result here without going through the details of the derivation. As a further 
generalization we will allow the number of observations to vary with the individual, which is 
useful for some applications. As above, we suppose we have observations from r individuals 
where r > 1. Let 


n; be the number of observations for individual i. 
pij be the weight attached to the jth observation of individual i. 


Pi= È; Pij pip 


Then, if Xij is the value of the jth observation of individual i we can take for unbiased 
estimators of the parameters, 


=, _ Li PiXy Dj, y 


i 
= 
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where 


and 


> P(X - X - (r - y 
p-p'Xp 


w= 


In the case that each pj; = 1 and n; = n for all i, we have p; = n, p = rn and it is easily 
verified that the estimators reduce to those given in the previous section. 


Notes and References 


This chapter constitutes a basic introduction to credibility theory, and certain topics are 
omitted. We have not covered limited fluctuation credibility theory, an older approach to the 
subject, that is still in use, despite some theoretical drawbacks. There are also many aspects 
of the estimation of parameters that we have not covered. There are alternate methods which 
avoid the problem of a negative estimate of w. Another frequent application is where some, 
but not all, of the features of the underlying distributions are known, so different estimation 
procedures are employed. A common occurrence of this type is one where f(x; 0) can be 
specified, but there is no reasonable way to determine z(0). Readers can consult Klugman 
et al. (2012) or Herzog (1999) for more information on these topics. For a more advanced and 
complete treatment of credibility theory, see Bülhman and Gisler (2005). 


Exercises 


24.1 A die is selected at random from an urn that contains two six-sided dice. Die number 
] has three faces with the number 2, while the other three faces are numbered 1, 3, 
4. Die number 2 has three faces with the number 4, while the other three faces are 
numbered 1, 2, 3. The first five rolls of the die yielded the numbers 2, 3, 4, 1 and 4, 
in that order. Determine the expected number for the sixth role of the same die. 


24.2 Urn A has four balls numbered 1—4. Urn B has six balls numbered 1-6. An urn is 
selected at random, and then a random draw produces a ball with number 4, which is 
replaced. A ball is then drawn randomly from the same urn. Find the expected number. 


24.3 Three urns contain balls marked either O or 1. In urn A, 10% are marked 0; in urn B, 
60% are marked 0; and in urn C, 80% are marked 0. An urn is selected at random and 
three balls selected with replacement. The total of the values is 1. Three more balls 
are selected with replacement from the same urn. Find the expected total on the three 
balls. 


24.4 One spinner is selected at random from a group of three spinners. Each spinner is 
divided into six equally likely sectors. The number of sectors marked 0, 12 and 48, 
respectively on each spinner is as follows: Spinner A: 2, 2, 2; Spinner B: 3, 2, 1; and 
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24.5 


24.6 


24.7 


24.8 
24.9 
24.10 


24.11 


24.12 


24.13 


24.14 


24.15 


24.16 
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Spinner C: 4, 1, 1. A spinner is selected at random and a 0 is obtained on the first spin. 
What is the expected value for a second spin of the same spinner. 


Complete Example 24.3 by finding the expected claim amounts given observed claims 
of (a) (0,0), (b) (0,10), (c) (10,0). 


The number of claims in 1 year has a Poisson distribution with parameter O. The 
parameter O has a gamma distribution with mean 2 and variance 2. A particular 
insured had one claim in 1 year. What is the expected number of claims for this 
policyholder for the next year. 


Suppose that X ~ Bin(m, ©), where 0 has a Beta(a, f) distribution. That is 


T(a + f) 


a tl Oy! <6 <1, 
I'(a)I (B) 


z(0)— 


Given n independent observations of X, find the expected value of the next observation 
in terms of the parameters n, m, x, a, f. 


Repeat Exercise 24.6 only now assuming that X ~ Negbin(r, ©). 
In Exercise 24.2, find the Bühlman estimate of the expected number on the next ball. 


In Exercise 24.3, find the Bühlman estimate of the expected total on the next three 
balls. 


In the Bayesian credibility model, X|O ~ Exp(©), where © ~ Gamma(, 1). An indi- 
vidual has a claim of 5 in the first period. Find the expected claim in the second period 
for the same individual. 


X has a Poisson distribution with parameter O, where z(0) — 3074, for0 > 1. A 
particular insured experienced a total of 20 claims in the previous 2 years. 


(a) Determine the Bühlmann credibility estimate. 
(b) Determine the exact Bayesian credibility premium in terms of an integral. 


Suppose that X and O satisfy the conditions of Theorem 24.2. You are given that E(X) = 
1, E(X|X, = 4) = 2, where X, is the value of a single observation. If E[ Var(X|0)] = 3, 
find Var[E(X |9)]. 


Redo Example 24.8, only now assuming that the aggregate claims for the first 3 years 
are ( 4500, 2200, 3000) and that E[Var (Y|©)] = 125. 


A taxi-cab company keeps a small fleet of cars, the number of which can vary from 
year to year. For each car, the number of accidents in a year is Poisson distributed 
where the Poisson parameter is uniform on [0,1]. The data for the past 3 years show 
one accident from four cars, two accidents from five cars and zero accidents from 
two cars. Next year the company will have three cars. What is the Bühlmann-Straub 
estimate for the number of accidents next year. 


(a) Redo Example 24.10 assuming a third policyholder is observed, with claims in 
the first three periods of (7,5,6). 


24.17 


24.18 


24.19 
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(b) Now, assume a fourth policy holder is observed with claims of (1,0,2), and calcu- 
late the credibility premiums. 


(c) Explain briefly why the value of the credibility coefficient Z is lower in (a) than 
in the original example,despite the fact that there is additional data. Why does it 
increase in (c). 


Show that exact credibility holds for the distributions of Exercises 24.7 and 24.8. 
Identify the value of k in each case. 


Suppose that in the Monte Hall problem, the strategy of the host is as follows. If you 
pick the correct door, the host will either open the lowest numbered blank door, the 
highest numbered blank door or not open any door, each with probability 1/3, and if 
you pick an incorrect door, the host will either open the remaining blank door, or not 
open any door, each with probability 1/2. If the host opens a blank door, should you 
switch or not? 


Show that E,(W), as defined in Section 20.8 is equal to E(W|Y) as defined in Section 
24.2 for a suitable random vector Y. 


Answers to exercises 


(c) 465 


Chapter 2 
2.1 (a)6 (b) 150 
22 1 
2.3 4.1 
2.4 (a) C1, 2/3, 4/9, 8/27, 2/9, 1/6) (b) 205 
2.5 (a)l.5 (b)10.75 
2.6 (a)—19/12 (b) 40/3 
2.7 (a)360 (b)440 
2.8 For last part, 17 = 4.4 + 0.7 x 18 
2.11 The first. 
2.14 d,=12.2 
2.15 0.8 
2.16 (a) 9980.89 (b) 10 117.40 
2.17 (a) better off by 133.72 (b) worse off by 61.38 
2.28 v(0, 1) = 0.7, v(1, 2) = 0.4375. 
220 (b)i,-2r/(1-4 r) 
2.23 746.92 
2.24 0.045 
2.25 


Initial payment is 8177.15. As examples, outstanding balance at time 5 is 49 560.15, 


and at time 15 is 19 132.59. 


Fundamentals of Actuarial Mathematics, Third Edition. S. David Promislow. 
© 2015 John Wiley & Sons, Ltd. Published 2015 by John Wiley & Sons, Ltd. 
Companion website: http://www.wiley.com/go/actuarial 
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Chapter 3 


3.1 


3.2 
3.3 
3.4 
3.5 
3.7 
3.9 


0.288 
1/10, 1/6, 4/15, 11/20 

(a) l —n/(100 —x) (b)n/(100—3) (c) K/(100 — x), for suitable n, k 
22.8 

(a) (1 — q) 


Chapter 4 


4.1 
4.2 
4.3 
44 
4.5 
4.6 
4.7 
4.9 

4.10 

4.11 

4.13 

4.14 

4.15 

4.16 


3.64 
3316.62 


(a) 1000, 800, 600, 450, 315, 189. 
(b) (i) 285/800 (ii) 189/600 
(c) 2.5052, 2.1315, 1.842, 1.456, 1.08 


(b) (1 — q)/4 
eg = 79.83 for original table. New values are 77.72, 78.41. 


(a) 609.01 (b) 755.95 


16.2 
11111 


Yso © 20) = 0.384, ys3(2) = 0.48 


1/( - v1 = 9) 
19 900 

3.29 

153.85 

947.83 

274 154 

(a)20 (b)13 


G1 40}+20 = 11.54, 
interest years. 


Chapter 5 


5.1 
2:2 
53 


0.0760 
135.04 
55.06 


d(50)410 = 10.77 which is lower since is covers more of the high 
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5.4 (0.05, 0.1, 0.15) 
5.5 (a) S = 10004(055,1,,)/C1 — Agg(125)) (b) 2222.22 
5.6 9.57% 
9.7 nds 
5.8 q/(itq) 
5.9 Lower 
5.10 625 
5.11 0.3015 
5.12 0.4 
5.16 (a) 
p = 100LoP40%(O20, Lio; V) + 4400030 13] — 1 49.3, ... 9, 10, 10,..., 10) 
ü49(1,9) — A49G) eS 
11 times 
(b) 
Ta 1000[4(1 10; v) + 4[40}420(910> Lo)] 
GC 19; v)v(20, 0) 
5.19 Change the vector j so that jj; = 0.5 Val, i (15,,; 5) ifü € k <5 
5.20 194.08 
5.23 1928.27 
5.24 469.67 
5.25 (a)19.53 (b) 12.49 
Chapter 6 
6.1 444.58, 506.36, 985.84, 4000 
6.2. 467.74, 451.61, 865.59, 2000 
6.3 (a) (1, 0.75,0.51) (b) (—1005.79, —760.88, 853.19) (c) —1341.05, —853.19 
(d) Benefits decrease 
6.4 420, 580, 900 
6.5 80 
6.6 (a) 91.40 + 376.34, 222.88 + 240.86, 0 + 467.74 
(b) Interest gain = —45.97, mortality gain = 56.73 
6.7 39.49 + 405.09, 162.27 + 282.31, —125 + 1014.16 
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6.8 (a) 160 (b) 1065 — 662 (c) Interest gain — 12, mortality gain — 32 
6.10 (a) 311.87 (b)398.50 (c) 3200 

6.11 650 

6.13 (a)3.4 (b) —0.048 

6.14 (a)10 (b)2 

6.17 (a)200 (b)40 

6.18 (a)125 (b),V — 50, ¿V = 150, k = 2,3,... 

619 j»i 

6.20 2307.69 


6.22 (a) Premium = 368.85. For example, 55V = 14.473. 
(b) Decrease for for first 15 years. (c) Increase for first 15 years. 


Chapter 7 
7.1 (a) 153002 (b) 152 113 
7.2 (3) 3017 (b) 2665 
7.3 (a)96670 (b) 96071 
7.4 1/3 
7.5 0.56 
7.6 0.144 
7.7 2847 
7.8 200 
7.9 0.64 
7.11 Overstate by 0.0582 
7.12. 103.19, 502.90 


Chapter 8 

8.1 0.96 

82 23237 

8.3 679.50 

8.4 (a)0.2083 (b)0.179 26 
8.5 (a) 1.744 (b) 1.733 
8.6 0.292 69 


8.7 
8.8 
8.9 
8.10 
8.11 
8.14 
8.15 
8.16 
8.17 


8.23 
8.24 
8.25 
8.27 
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(a) 0.4541 (b) 0.093 02 
0.777 

(a) 50 (b) 312.50 
(a)0.488 (b) 0.421 

0.3 

z — u(u + ó)/(u  ó—y) 
0.81 

2/(140 — n) 


= ] — e7”1+ô) ^ g-nünó) i in (1- gw, 


uet +ô) 


a, = š = 
i My +6 Hy + 6 i ui +ô 


20 

(a) 0.00047877 (b) 0.003560, 0.003564 

(a) 0.89944 (b) 0.90281 (c) 0.89673 (d) 0.89674 
(a) (3/4)(e°* — 1) 


Chapter 9 


9.1 
9.2 
9.3 


9.4 
9.5 
9.6 
9.7 


0.274 68 
(a) 137.27 (b) 51.63 


Hy +ô 


The probability that a person age 40, first observed age 30, will die between the ages 


of 46 and 54 

liso] = 6313, Zie = 5553, lieg = 4717 
0.03163 

(a) 2957.31 (b) 35 664.74 

(a) 23.108 (b) 23.089 


Chapter 10 


10.1 
10.2 


(a) 0.4, 0.1. (b)0.9,0.55 (c)1480 (d)864 (e)2392 (f)5 


both atleastone exactly one 


28 


neither 


live 5 years 0.48 0.92 0.44 
die within 5 years 0.08 0.52 0.44 
die between time 5 and 6 0.02 0.28 0.26 


0.08 
0.48 
0.72 
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10.3 
10.4 
10.5 
10.6 
10.7 
10.8 
10.9 
10.10 
10.11 
10.12 
10.13 
10.14 


10.15 
10.16 
10.17 


10.19 
10.20 
10.21 
10.22 


10.23 
10.24 
10.25 
10.26 
10.27 
10.29 
10.30 
10.31 
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403.70 

9/110 

(a) 0.3352 (b) 0.1648 (c) 1.4 

(a) 9/32 (b) 3/32 

0.1757 

0.7294 

1.2 

(a) 0.1547 (b) 0.0859. 

(a) 409.72 (b) 647.87 (c) 374.92 (d) 350.85 
ü49(159) + dso(019. Lao) — 49:59 (010. 110) 


449129) + å50(010; 159) — d49:59(010. 110) 


3A, + 3A, - Ay, 
Q/3)à, + Q/3)à, — 1/3)axy 


å (1210; 649) + à, (1219, 819) — à, (1219. 219) 
Aá49(139) + 5ásg(159) + d49:59(—359 210) 


For all values of t, ,V — O if both are alive, 0.1765 if (x) only is alive, 0.8824 if y only 
is alive. 


0.6720 
Either P(D) < P(B) € P(A) € P(C) or P(D) < P(A) < P(B) < P(C) 


1.0041 


(a) ndi, T d ari 2,0, 


(b) nx Tn al, Tn diy Tn di. 


(a) Ā2 (b) = Á,(b) + AL (b) - A, (b). (b) A2. (b) = AS — A 
0.14. 


(b) 


2 
xy 


A! 


Ay(1,) + V^,p, y*nx 
a(1,) + V" PrOxtny T V" Dy nx A V aPxyðx+n:y+n 
(a) -1/7 (b) (-1/y) - 1/(u — y) 

20.50 

V” PxAytn = Õy4n:x 


(a) 211.44 (b)158.20 (c)0.749 
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Chapter 11 
11.1 29/110 
11.2 (a) 0.6852 (b) 0.3245, 0.4445 
11.3 1/30 
11.4 Method 1 gives 0.219 and 0.438. Method 2 gives 0.2235 and 0.4335. 
11.5 0.04595 
11.6 0.0552, 0.0869, 0.1074 
11.7 Method 1: 0.0786, 0.1666, 0.2148 Method 2: 0.0792, 0.1667, 0.2142 
11.9 5/9 
11.10 (a) 0.162 67, 0.325 33. (b) 0.164, 0.324 (c) 0.182, 0.306 


11.11 Letting a denote dq and b denote a?b?)/4; for (11.18), 
(a*b + b’a)/(4 — 2a — 2b + ab). 


11.12. False. The left hand side equals 2/2. 
11.13 0.384 

11.14 0.14, 0.18 

11.16 0.063 07, 0.132 07, 0.208 40, 0.294 07 
1117. q® 2 9.111, q” = 0.185 

11.18 Using the notation of Example 11.5 


b=b' oa 227 
/ 16 ro b'c! 47 ILII 
= - b 
dS iu 2 4997 €* 
Chapter 12 
12.1 G = 100550 A, (159) + à, (1020, 260,5) 
à,(0.4, 0.959) 
122 30 
12.3 1175.86 
12.4 97.46, 24.14, 260.77 
12.5 Ay42(b o 2) 


x45 (1, 2) 
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12.8 (77.65, 4.76, 175.56, —31,20, 0.25, —0.16) 
12.9 (a) (—23925.26, —1948,40, 2698.97, 2527.36, 2945.05), 0.0103 (b) 0.0141 (c) 0.0141 
12.10 22085 


Chapter 13 


13.1 
13.2 
13.4 
13:5 
13.7 
13.8 


29340, 9.85 

(a) 7111.08, (b) 612.74 

51285, 1022.36 

New coefficients are (69051, 76751, 85128) 
0.88 

1.085 


Chapter 14 


14.1 


14.2 
14.3 
14.4 
14.5 
14.6 


14.7 
14.8 
14.9 
14.13 


fC) = 1/4,f@) = 3/8, f(3) = 9/32,f (4) = 3/32. Time 2 is the most likely cause of 
failure. 


f(k) = 2k3*/(k + 3)! 
el 

(a) 1.27 (b) 0.8357 
1335 


Machine 1, which has an expected output of 400 000 copies, as opposed to 360 000 
for Machine 2. 


(a) 0.1215 (b) An overstatement of 0.0008 

(b) uj (t) = ?t/ (1+ Bt) (c) This is a mixture of an exponential and a gamma. 
(a)0.003(10—2?2, 0€ t € 10. (b) t/25 — 32/1000, 0€ t< 10 

(a) Yes (b) No 


Chapter 15 


15.1 
15.2 
15.3 
15.4 
15.5 


1.1070, 0.0041 

(a) 1.86, 0.7204 (b) 0.70 
1 

3; 83 

7.5, 3.75 

16.67, 222.22 
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15.6 The exact distribution of į L is given by 
k P(T =k) L 
1 0.20 —100 
2 0.20 —180 
2 0.18 —244 
23 0.42 268 
Therefore ,V = 12.64. 
15.7 (a) —13.76,1840.34, 1344 (b) 0.49 
15.8 L = 50 with probability 0.2, 10 with probability 0.32, and —2.8 with probability 0.48. 
So oV = 11.856. 
15.9 (a) 271 (b) 0.933 
15.10 0.45, 0.675 
15.11 0.0248 
15.12 (a) 17, 28.5 (b)14.78, 0.30 (c) 8: 
15.13 (a) L = 160 with probability 0.2; 40 with probability 0.32; —20 with probability 0.24; 
—45 with probability 0.24. ;Z = 160 with probability 0.4; 40 with probability 0.3; 
—10 with probability 0.3. So ,V = 73. 
(b) P = 665 
15.15 (a) 483.39 (b) 458.02 
15.16 (a)13 (b)0.07 
15.17 E(Z) = 5445  Var(Z) = 341.80 
15.18 u/(u*à—y) u/(u*28-2y) - (uf(u à - yy 
15.19 (a) 900--25c? (b) 900 — 300c + 25c?, which equals 0 for c = 6 
1520 (agY-2Tfor0xT €6andY 26 whenT 26 (b)3.5,4.25 (c)0.5775 
15.21 676 
15.23 (a) f(t) 2 t when0 € t « 1and 1/2 when 1 «1? € 2. (b) 2.075, 4.891 
15.24 26 870. 
Chapter 16 
16.1 0.196 
16.2 0.32 
16.3 20 
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16.4 
16.5 


16.6 
16.7 
16.8 
16.9 
16.11 
16.12 
16.13 
16.14 


ANSWERS TO EXERCISES 


0.04902 


(a) £2/(8--8 (b) B+E (c) B°/(28 + 4) 
(d) f?k/(1 + BIOQ + à) 


0.24 

(a) 0.5931 (b)0 

(a) 0.6376 (b) 0.8187 

Fz(z) = 0.76 for 0 < z < 0.1296, 0.4 + z!/? for 0.1296 < z < 0.36, 1 for 0.36 < z. 
(a) 0.25 (b) 0.882 (c) 1 

(a) 0.8521 (b) 0.2219 

(b) Not true. Subtract 2s. from right hand side. 


Change both occurrences of Fy to Fy. 


Chapter 17 


17.1 


17.2 


17.3 
17.4 


17.5 
17.6 
17.7 


17.8 


17.9 
17.10 


F(t, 1) = F(t,2) = aP- P,O « t € l, and ; when t » 1. 
F(t, 1) = SP -B,0<t< 1 and 5 when t > 1. 


(a) F(t, D = $(1 - e75), F(,2) = 4- e). (b) f,(1) = 2/5, fjQ) = 3/5. 

For 0<t<a, F(t,2)=(l-e)/ya. For t>a, F(t,2)=PU=2)=(1- 
eM") / pua. 

15/16 

(a) 21/40 (b) 4/10 


(a) For0xt« 1, F(t,1) — =U -ü-20*,  F(t,2)= z Sipe, 
For t > 1, F(t, 1) = 1/3, F(t,2) = 2/3. 


(b) AD = 1/3, f,(2) = 2/3 
(c) wt, 1) = 4/3 — D, n(652) = 6/00 - 0 — 7) 


(a) For 0 € t< 1, F(t,1) = (8t —215)/9, F(t,2) = (2 —1*5)/3. For t > 1,F(t,1) = 
2/3, F(t,2) = 1/3. 


(b) HA) = 2/3,f,Q) = 1/3 

(c) 14 (0 = (8(1 — 8)/(9 — 8t — 677 + 51^) w(t) = 12(t — 87) /(9 — 8t — 67? + 51^) 
utj) 2 2/0 - t), u( = Q—6r-62)/0 — 2+ 3 20), 0€ («1 

utj) = Gr - 32)/1 — 3? + 20), u(t) = (6t 32)/2 - 32 P) 


17.11 
17.12 
17.13 
17.14 
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@)a/(@ +P- A-A] O<t<1 (bf/(-p) (ca/( B1) 
(a) 0.2884 + 0.0473 + 0.1119 = 0.4476 (b) 0.2346 
(a) 0.7520 |. (b) 0.1654 (c) 0.0827 


In all cases f(0, 1) = f(1,0) = 1/2 — f(0,0), (1, 1) = f(0, 0). 
(a) f(0, 0) = 1/2 (b) /(0,0) = 0 (c) f(0,0) = 1/4 (d) f(0, 0) = 0.2497, 0.0139, 0.4861 
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18.1 
18.2 


100. None apply. 


For i#0,7, p; =? /P, py-2ir-ip/r, Prii (r-1?/r, p0,1) = 
pír,r—1)=1 


1/2 1/2 0 
(3| 1/3 1/3 1/3 (b) 55/216 (c) 2/7, 3/7, 2/7) 
0 1/2 1/2 
0,1,3,4, are recurrent, 2 is transient. 
All are recurrent. 
Only 3 is recurrent. 
q/(p + 4). p/p * q) 
Limiting distribution is uniform for 5 chairs, does not exist for 4. 
4/5, 6/5, 7/5 
(a) 5 feet, 21/76 (b) 5 feet, 28/93 
(a)0.5e7! (b)I—e-U2 (e)5/3, 5/9. 
(a) 4.5e7? (b) Exp(3), Gamma(n,3) (c) 54e7?. 
(a) 0.012 74. (b) 0.26424 (c) 9,7 


(a) Increments not stationary (ii) If increments stationary, not independent 
(c) Given the first number, the second must be — log(0.5)/2 


(a)2.5e! (b)I—e-9^ (c)3.3,3/4. 

(a)2e? (b)2e?2 (c)48 (d) 18/125 (e)(7/2)e57 
7/3,9 

0.030 904 

(a)72 (b)0.2875 

0.434 
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18.22 
18.23 
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0.121 
0.363 
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19.1 
19.2 
19.3 
19.4 
19.7 
19.10 
19.11 


19.13 
19.14 
19.15 
19.16 


1.2808 

(a) 82.69, (b) 2036.80 

0.333, 0.249 

(a) 45.24 (b) (i) 71.11 (ii) 14.39 (iii) 21.88 
(a) 0.116, (b) 0.950, 0.1658 

(a) 69.60 (b) 4.177 


(c) Eigenvectors are —(a + b),—a,—b with respective eigenvectors of (1,1, 1,), 
(0, 1, 0, (0, 0, 1)). 


5/81, 20/91 
9, 24, 94 
—15420, 5350 


7,005 7 2,005: 35 . 3,005: 28 
(a) Reserves for states O, 1, 2 (resp.) are Ti z 3e iog? Je 108 
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20.1 
20.2 
20.3 
20.5 
20.6 


20.7 


20.8 
20.9 


20.10 
20.11 
20.14 
20.16 


0.04 « r « 0.10. 

For example, take initial portfolio of (64.6, —0.7, 1). 
(a) 12.54 (b) 11.45 

Cost is 7.46 compared with 6.375 for European. 


(a) Probabilities of (U, M, D) resp. are (p, (2 — 5p)/3, (2p + 1)/3),0 « p « 4. 
(b) 62/3 « z « 12, (c) For example take initial portfolio of (—48, 0.6, —1). 


(a) Arbitrage-free, not complete. (b) L = (f : 2/(U) — 3f(M) + f(D) = 0Lo = {f : 
f(U) = —9a,f(M) = a, f(D) = 21a) for some a. 


(d — (1 + r))/(u — (1 + r)) in both cases. 


Calls: (1.24, 4.75. 11.86), Puts: (9.61, 3.26, 0.52). Calls decrease with strike, increase 
with interest, duration and volatility. Puts increase with duration, strike, volatility, 
decrease with interest. 


(a) 5 (b) 8.07 (c) 0.70 

(a) (0.707, —0.643, (b) (0.803, —0.761), (c) (0.504, —0.463). 

v(0, 2) = 0.525, v0, 3 = 0.42525, v(1, 3(U) = 0.68, v(1, 3)(D) = 0.39. 
(a) 14 (b) 47.25 
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21.1 
21.2 
21.3 
21.4 
21.5 
21.6 
21.7 
21.8 
21.9 
21.10 
21.11 
21.12 
21.14 
21.15 
21.16 
21.17 
21.18 
21.19 
21.20 
21.21 
21.22 
21.23 
21.24 
21.25 
21.26 
21.27 
21.28 
21.29 
21.30 


56 

0.1064 

0.054 

5.7251 

120, 25 600 

32/3, 224/3 

(a) 0.102 (b) No 
3/2, 9/4 

(a) 0.1025 (b)11.75 
(a) 2/3,4/9 (bye 
0.753 

(a) 0.1590 (b) 9/16 
0.95 

0.75(X ^ 3840) — 0.75(X ^ 240) 
0.0996 

4.0625 

200 

The deductible 
1600.74 

0.0022 

180.60 

(a) 106.53 (b) 62.5 
4.44, 55.31 

114.40 

2400 

()1—e^ (b)I—[0/(0-- d)! (c)I—(Q-fd)/2)e 94 
(b) 0.2769 

fs(12) = Sd xc 


Negative binomial 
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21.31 


21.32 
21.33 
21.34 
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x 0 1 2 3 4 5 


Poisson 0.0111 0.0350 0.0701 0.1051 0.1301 0.1387 
Negative binomial 0.0442 0.0696 0.0968 0.1082 0.1110 0.1050 
Binomial 0.0020 0.0123 0.0397 0.0858 0.1378 0.1737 


0.0693 
Negbin(2, 0.5) 
N ~ Poisson(6). X takes values 2, 3, 5 with probabilities 1/3, 1/6, 1/2, respectively. 
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22.1 
22.3 
22.4 
22.6 
22.7 
22.8 
22.9 
22.10 
22.11 
22.12 
22.14 


(c) Pog = 24.70, Poo, = 15.86, lim,_,,, P, = 10 

Y is less risky than X 

Y and Z are both less risky than X, but Y and Z are incomparable. 
(a) W(X) 23, A(Y)=2.5 

TVaR(X) = 4 14/49. 

(a) 1, 4/3 (b) 7/4, 11/6 

Na, N(i+a)/2 

3,5 1/7 

117.86 

(a) 3.07 3.07, 5.73 


(a) The mean 
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23.1 
23.2 
23.3 
23.5 
23.10 
23.12 
23.13 
23.14 
23.15 


(a) 0.18 (b) 1.6a — 0.4b — 0.2c 
(a) 10 (b) 11 (c) 1024 

0.4825 

(c) Q/3*. (d) (2/3) 

wi (1) = 7/36, wo(1) = 1/36 

(a) R > 0.23026 (b)45 (c)3/4 
0.1535 

(a) 0.4160 (b) 3/16 


Strictly greater than, provided 0 > 0. 


23.17 
23.19 
23.20 


23.21 


(a)u> 1.4979 (b)7/8 (c) 0.7239 (d)3/4 


(b) Ze —u/3 4 + ze e 44/3 
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24.1 
24.2 
24.3 
24.4 
24.5 
24.6 
24.7 


24.8 


24.9 
24.10 
24.11 
24.12 


24.13 
24.14 
24.15 
24.16 


24.18 


17/6 

2.9 

0.9747 
12.89 

(a) 2.61 (b) 3.53 (c) 3.53 
1.5 
m() 
P (4) 
3.107 
1.1929 

3 

(a) 5.75 


nanus 
(b) nar 166-20 


3/2 

4391 

1.06 

(a) (4.95, 6.24, 5.81) 

(b) (4.08, 6.61, 5.77, 1.54) 
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Appendix 


A review of probability theory 


This appendix provides a review of the basic probability theory used in this book. In addition, 
more specialized topics in probability theory, will appear in the relevant chapters as they are 
needed. It is expected that most readers will be at least somewhat familiar with this material, 
so the pace is fairly rapid, with few examples or derivations. 


A.1 Sample spaces and probability measures 


We model the results of a random experiment by a set Q, known as a sample space, where 
each point of Q corresponds to a possible outcome. For example, if we throw a pair of dice, 
the sample space could consist of the 36 ordered pairs (a, b) where a and b take values from 1 
to 6. A combination of outcomes, known as an event, is represented by a subset of Q. Such an 
event occurs if any of the outcomes in the subset occur. In the above example, the event that 
the total on the dice is 10, would be represented by the set ((4, 6), (5, 5), (6, 4) }. From a given 
collection of events, familiar set operations can be applied to build other events. The union 
of events, denoted with the symbol U, gives us the event that occurs if any one of the given 
events occur. The intersection of events, denoted with the symbol n, gives us the event that 
occurs if all of the given events occurs. The complement of an event A, denoted by A‘, gives 
us the event that occurs if A does not occur. Two events are said to be mutually exclusive if 
the occurrence of one means that the other cannot occur. These are represented by subsets A 
and B that are disjoint, that is, A N B = Ø (the empty set). 

To each event A we assign a number P(A) in the interval [0,1], known as the probability 
of A. This measures how likely it is for the event A to occur in a single trial of the experiment. 
The assignment P(A) must satisfy the following fundamental rule: Given a finite or countably 
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infinite sequence A,,45, ... , A... of pairwise disjoint events — that is, A; A; =Ø for 
i £ j — then 


P lU al = by P(A;). (A.1) 


i 


Moreover, we require that P(Q) = 1. 

These requirements can be motivated by adopting the relative frequency interpretation of 
P, whereby P(A) gives the expected proportion of times that the event A will occur if the 
experiment is repeated a sufficiently large number of times. 

We often refer to P as a probability measure on Q. 

The following are some simple, frequently used consequences of (A.1), 


P(fy) 20, A CB implies P(A) < P(B),  P(A°) = 1 — P(A), (A.2) 
and 
P(A U B) = P(A) + P(B) - P(An B). (A.3) 


We now discuss a technical difficulty that arises when the sample space is not finite or 
countably infinite. In this case, postulating that P(A) should be defined for all subsets A is 
overly restrictive, and would prevent one from finding suitable probability measures in many 
cases. Accordingly we only require that P(A) be defined when A belongs to a certain specified 
collection of subsets, which we denote by S. We do, however, require certain restrictions on S 
to ensure that countable families of events can be combined in the familiar ways we described 
above. These requirements are as follows: 


(a) Given any finite or countably infinite sequence A4,A3, ..., A, ... of sets in S, 
(i) their union is in S, 
(ii) their intersection is in S. 

(b) For any A € S, the complement A‘ € SS. 

(c) The whole set Q € $. 


A collection of subsets with these properties is known as a o-field. (Since (a)(ii) follows 
from (a)(i) and (b), while (a)(i) follows from (a)(ii) and (b), many authors omit one of (a)(i) 
or (a)(ii) in the definition of a o-field.) As a consequence of (a)(i) we know that the left hand 
side of (A.1) makes sense. 

To summarize, in modelling a random experiment, we choose a sample space Q, a prob- 
ability measure P and a o-field S. When Q is finite or countably infinite, the collection S is 
invariably taken to be all subsets. In the remaining sections of this appendix, we assume a 
fixed Q, P and SS. 
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A.2 Conditioning and independence 


If we are told that a certain event B has occurred, we need to reassess probabilities. Assuming 
P(B) > 0, this is done by defining a new probability measure, denoted by P(-|B), as follows: 


P(A n B) 


P(A|B) = P) 


, AES. 


We often refer to the left hand side as ‘the probability of A given B’. This new measure assigns 
probability 0 to events disjoint from B, and for A C B it assigns the probability P(A)/P(B). In 
a sense, we can view this as changing the sample space from Q to B, and then dividing by a 
constant to ensure that the probability of the entire space is 1. 

The symbol P(A|B,, B5, ... , B,) denotes the probability of A given that all of the events 
B; have occurred. That is, it equals P(A|B) where B = (Y? | Bj. 

We say that A and B are independent if 


P(A|B) = P(A). 


In other words, knowing that B occurred does not affect our original assessment of the 
likelihood of A. It is convenient to write this condition in the symmetric form 


P(A N B) = P(A)P(B), 


which shows immediately that we also have P(B|A) = P(B). Moreover, this form makes sense 
if either set has probability 0, in which case we automatically have independence. 

More generally, we define a collection of events (possibly infinite) to be independent if, 
given any finite subcollection A;,A5,...,A,, 


P In J = P(A)P(A))  P(A,). (AA) 


1=i 


A.3 Random variables 


Random variables are, intuitively, numerical quantities associated with a random experiment. 
For example, we toss 100 coins and count the number of heads. Formally, we represent 
the random variable by a real-valued function X defined on Q, where for œ € Q, X(@) is 
the number obtained when the outcome of the experiment is œ. The technical restriction 
we referred to above when Q is not finite or countably infinite also puts restrictions on the 
possible functions we consider. Given a random variable, traditionally denoted by a capital 
letter like X, and a subset A of real numbers, we would like to find the probability that X takes 
a value in the set A. This is not possible in our model if the event (co € Q : X(@) € A} is not 
in the collection of events S that we are able to compute probabilities for. Accordingly, in 
the definition of a random variable we postulate that at least for A of the form (—oo, r], the 
corresponding event is in S, so we can always compute the probability that X is less than or 
equal to r. From the properties that we imposed on S it then follows that the event {X € A} 
is in S for a large collection of subsets A of real numbers, known as the Borel sets. This class 
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contains all intervals, and, roughly speaking, all sets that can be formed from intervals, by 
repeatedly taking unions and intersections over countable index sets. 


A.4 Distributions 


In many cases, given a random variable, we do not need the details of Q or the particular 
function X, but wish only to know the distribution of X, which is basically a description of 
how likely it is that X will take on certain values. We will concentrate on two main types 
of random variables. There are discrete random variables, which in almost all applications 
in this book will take nonnegative integers as values. Second, there are continuous random 
variables, which take values that vary continuously over some interval of the real line. In 
almost all of our applications, this interval will be [0, co) or [0, N] for some finite N. We will 
therefore simplify the following discussion by assuming (unless otherwise mentioned) that 
all our random variables take nonnegative values. 

We will occasionally encounter mixed random variables that have both a discrete and 
continuous part. They will be dealt with in turn as they arise. 

For a discrete random variable X there are two main functions for describing the distribu- 
tion. First, the probability (mass) function fy is a function defined on the nonnegative integers 
by 


fsk) =P(X =k), k=0,1,2,.... 
Second, the (cumulative) distribution function (c.d.f.) F is defined on [0, co) by 
Fy(x) = P(X < x). 
The two functions are related by 


fx(k) = Fy(k) -—Fy(k-1), forall integers k < 0. 


k 
Fy() = Y KO, 


i=0 


where k is the greatest integer < x. 

We can define the distribution function Fy exactly as above for any random variable. 
The precise definition of a continuous random variable is one for which the function F 
is continuous. In this case we get no information from looking at the probability that X 
will take on a specific value, as this will always be zero: (P(X 2 r) x PX € (r — h,r]) = 
Fy(r) — Fy(r — h), for all h > 0, and this goes to 0 as h — 0). In place of the probability 
function, we define a probability density function (p.d.f.) fy as the function satisfying 


b 
P(a<X <b)= / fade. (A.5) 
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The two functions are related in the continuous case by 


AosmG, Ru / ROD: (A.6) 


(Primes denote differentiation when clear from the context.) 

The lower limit of 0 on the integral is a consequence of our assumption of nonnegative 
values the general case we would need a lower limit of —oo. 

When there is no confusion, we will sometimes omit subscripts and simply write f or F. 

The reader is cautioned that values of the density function are not probabilities and can 
take values greater than 1. We can interpret these probabilistically by the intuitive statement 
that for a ‘small’ value of Ax, the probability that X will take a value between x and x + Ax is 
‘approximately’ f(x)Ax. This can verified from (A.5). 

Density functions need not exist for a given continuous distribution, but throughout the 
book it will be assumed that they do exist, unless otherwise indicated. 


A.5 Expectations and moments 


The expectation (also known as the mean) of a random variable X, represents in some sense 
the average value that X will take. It is given by 


E(X)- Y Mf) or ri ” Xfoods, 


k=1 


depending on whether X is discrete or continuous. 

Of course the above series or integral may diverge, in which case we have E(X) = oo. (In 
the general case where X is not necessarily nonnegative-valued, the expectation may not exist 
at all.) 

If g is a function defined on a set that includes the range of X, we can define another 
random variable g(X) that takes the value g(x) when X takes the value x. We refer to such a 
random variable as a function of X. It can be shown that the expectation of a function of X is 
given by 


E[g(X)] = > stor (k) or p gf G)dx, (A.7) 
k=0 


depending on whether X is discrete or continuous. 
In particular, taking g(x) = cx for some constant c leads to the formula 


E(cX) = cE(X). (A.8) 
Of particular importance are the functions g(x) = x”. For such a function E[g(X)] is known 
as the nth moment of X. 


We define the variance of X by 


Var(X) = E[X — EGO] = EXP) — E(Xy.. (A.9) 
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It is clear from (A.8) that for any constant c 
Var(cX) = Var(X). (A.10) 


The positive square root of Var(X) is known as the standard deviation of X. 

The smaller the variance, the more likely it is that values of X are close to the mean. A 
formal statement along these lines is given by Chebyshev’s inequality, which states that for 
k » 0, 


Var(X) 


P(X -E(X)) 2k < om (A.11) 


Up to this point, we have talked first about events and probabilities, and second about 
random variables and expectations. It is interesting to note that, if we wish, we can subsume 
events and probabilities under the latter category, and speak only of random variables and 
expectations. For any event A we define a random variable 74 that takes the value 1 on points 
in A and the value 0 on points not in A. These are known as indicator random variables. In 
this way the event A can be viewed as a random variable and P(A) is equal to E[/4], as can be 
easily verified. 


A.6 Expectation in terms of the distribution function 


Itis often desirable to express expectations in terms of the distribution function rather than the 
density or probability function. In place of F, itis more convenient to work with the function 


s(t) = 1 — F(t) = P(T > f). 
For a simple example, consider a discrete random variable X: 


E(X) = f() + 2/0) + 318) + 
=f) +fQ) +f) + 
+f(2)+/G) + 
+f) + 


Summing by rows, we get 
E(X) = X s(k). (A.12) 


k=0 


In the continuous case we can similarly derive 


E(X) = J "dk (A.13) 
0 
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We will in fact develop a more general formula. Suppose g is a differentiable function such 
that E[g(X)] exists and is finite. Noting that f(r) = —s'(f), we integrate by parts to obtain 


N N 
i: Of (dt = —g(N)sQN) + g0) + / g’ (t)s(t)dt. 
In the case where 
gN)s(N) > 0, asN — co (A.14) 


we can conclude that 
E[g(X)] = g(0) + ) £g (t)s(t)dt. (A.15) 
0 


A similar derivation in the discrete case shows that, for any function g satisfying (A.14) 
for integer values of N, 


E[g(X)] = g(0) + Sigk + 1) - g(k)]s(k). (A.16) 
k=0 


Condition (A.14) is automatically satisfied if g is bounded, since s(N) tends to 0 as N tends 
to co, or if X is bounded, since then s(N) = 0 for sufficiently high N. It is also satisfied for 
monotone g. Suppose, for example, g is increasing and nonnegative. Then (in the continuous 
case), 


g(N)s(N) = g(N) i f(t)dt € I 8f Odt, 


and the last term must approach 0, by the hypothesis that E[g(X)] = h g(t)f (Hdt is finite. 


A.7 Joint distributions 


Suppose we have two random variables X and Y defined on the same sample space. For many 
applications, we are interested not only in the distribution of each random variable, but also 
in the joint distribution, which gives information on how the two are related. When both are 
discrete, we can describe this by a joint probability function. This is a two-variable function 
fx.y defined by 


fx y(&, n) = P(X = k and Y =n). 


(As in the single-variable case, we sometimes omit the subscripts on f when no confusion 
results.) When X and Y are both continuous we define a joint density function fy y. This is a 
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two-variable function, such that for suitable regions A in the plane, the probability that the 
point (X, Y) lies in A is given by 


/ J. fx y C6 y)dxdy. (A.17) 


The probability or density functions associated with the component random variables 
are obtained from the corresponding joint function as the so-called marginal distributions, 
namely 


fell) = Y frk or f(x) = f fiy Gs y)dy, (A.18) 
n=0 


according as the random variables are both discrete or both continuous. 

As in the single random variable case, we assume that joint density functions exist unless 
otherwise indicated, but readers should be cautioned that for two continuous distributions, X, 
and Y, a joint density function need not exist even when both fy and fy do. For example, if 
X = Y, then (X, Y) lies in the region A = {(x, y) : x = y) with probability 1, but the double 
integral of any two-variable function over A will equal 0. 

For a function g of two variables defined on a set that contains all the points (X(o), Y(o)) 
where o is in the sample space Q, it can be shown, analogously to (A.7), that 


E[gQX, Y)] = Y, 9, e mfx y. n) (A.19) 
k=0 n=0 
in the discrete case, or 
E[g(X, Y)] = " f gx, yf y x, y)dxdy (A.20) 


in the continuous case. An important example is the function g(xy) = xy, which is used to 
define the covariance of X and Y given by 


Cov(X, Y) = E((X — E(X))(Y — E(Y)] = E(XY) — E(X)E(Y). (A.21) 


When Cov(X, Y) > 0, it means that the two random variables move in the same direction. 
High (low) values of one will tend to imply high (low) values of the other. Cov (X, Y) « 0 
means the two random variables move in opposite directions. When Cov(X, Y) = 0, we say 
that X and Y are uncorrelated. 

For another frequently used example, take g(x, y) = x + y. Then 


E(X +Y) = / | (x + y)fx y Go y)dxdy = f f Xfy y(x, y)dxdy 


-f / yfx y(x, y)dydx. 
0 0 
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(Note the interchange of variables in the second integral.) Now using (A.20) we derive the 
often used result that 


E(X + Y) = E(X) + E(Y). (A.22) 
We need not confine ourselves to just two random variables, and may want to consider 


functions of n random variables defined on the same sample space, where n is any positive 
integer. In the case of a sum, formula (A.22) extends by induction to 


E b J = Y E(X,). (A.23) 
i=1 i=1 


Another key formula involving a sum of random variables is 


Var | x] = Y Var(X;) + 2 Y Cov(X;,X;). (A.24) 
i-l i=] 


i i<j 


A.8 Conditioning and independence for random variables 
The notions of independence and conditioning can be extended from events to random vari- 


ables. We say that two random variables X and Y are independent if their associated events 
are independent. That is, for any two Borel sets A and B, 


P(X € A and Y € B) = P(X € A)P(Y e D). 

Asin (A.4), wecan extend this definition to any collection of random variables by requiring 
that, given any finite number of random variables X,, X», ... , X, from this collection and any 
Borel sets A4, A5, ... , A4, we have 

P(X, € Aj, X € A,,..., X, € AJ) = P(X, € A))P(X» € A5) © P(X, € A,). 

Equivalent formulations can be made in terms of the joint probability or density, or 


distribution functions. For example, in the bivariate case, the random variables X and Y are 
independent if and only if for all points (x, y), 


fx yO y) =fyOfyO), (A.25) 
or 

Fy yGo y) = FxGOFyG). (A.26) 
This leads immediately to the fact that 


E(XY) = E(X)E(Y) if X and Yare independent. (A.27) 
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As we would expect, two independent random variables are uncorrelated. However, X and 
Y can be uncorrelated without being independent (see Section 15.7). 

If X is a random variable and B an event with P(B) > 0 we can speak of the random 
variable X|B. This is just the restriction of X to the sample space B. Then 


E(X|B) = the expectation of X|B (A.28) 


with respect to the probability measure P(-|B). 

Conditioning is often a useful tool in computing an expectation, since we can break 
up the calculation into various cases. Formally we have what is known as the law of total 
expectation, which states that given a partition of Q, that is, a collection of pairwise disjoint 
sets Bi, B», ... , B,, with union Q, 


n 
E(X) = £ E(X|B;)P(B,). (A.29) 
i=l 
Taking X to be J, for an event A gives the law of total probability 


P(A) = X P(A|B;)P(B;). (A.30) 
i=1 


A.9 Moment generating functions 


Given any random variable X, the moment generating function (m.g.f.) is a function My of a 
real variable t defined by 


My(t) = E[e"*], (A.31) 


provided that E [e/*] exists in some neighbourhood of 0. (There are distributions for which 
E(e'*) = oo for all positive values of t, in which case the m.g.f. will not exist in any neigh- 
bourhood of 0.) 

Recalling the series expansion 


x Ps 
>+ +e, 


X _ 
RES 31 


so that 


2y2 | 8y8 
X DX EX auus 
e^ =1+tX+ 2! + 31 Tc. 


we see that the m.g.f. has the series expansion 


PEX?) PEX?) 


My(t) = 1 + tE(X) + 2 3 ess 
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from which we obtain 

E(X”) = the nth derivative of My(t) evaluated at t = 0. (A.32) 

Some important facts about m.g.f.'s are as follows 
Theorem A.1 For X and Y independent 
My,y(t) = My(0My(). 
Proof. Using A.26 we have 
My,y(t) = E(e€*P) = Eee”) = E(e™)E(e”) = My(rMy(), 

where we invoke the fact that functions of independent random variables are themselves 
independent to justify the third equality. 


The second theorem, which we state without proof, tells us that the m.g.f. determines the 
distribution when it exists. 


Theorem A.2 (The uniqueness theorem) Suppose that, for two random variables X and 
Y, My(t) and My(t) are equal in some neighbourhood of zero. Then X ~ Y. 


This theorem does not say that one can easily compute the distribution function or density 
function from the m.g.f. The idea is rather that if one recognizes an m.g.f. as being that of 
a certain known distribution, then the distribution in question must be that distribution. An 
example of its use will appear in Section A.11.2 below. 


A.10 Probability generating functions 
For discrete distributions that take nonnegative integer values, it is often more convenient to 


use another function closely related to the m.g.f. This is known as the probability generating 
function (p.g.f.) and is defined by 


P(t) = Y Af. (A.33) 
k=0 


This should not be confused with the probability function as defined above. 
Given the p.g.f., we can immediately find the complete distribution by 


(k) 
PPO) 
kc 


f = 


where the superscript (k) denotes the kth derivative with respect to t. 
The p.g.f. is related to the m.g.f. because we can write it as 


Py(t) = E(t*) = E(e *8'*) = My (log t), (A.34) 
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and it follows that 
My(t) = Py(e)). (A.35) 
If we find the p.g.f. of a discrete distribution, (A.35) immediately gives us the m.g.f. as well. 
The following technique will illustrate a good way to remember the p.g.f.’s of familiar 
distributions. Suppose we start with any function g whose power series expansion about zero 


has all nonnegative coefficients (i.e., all its derivatives are nonnegative at 0) and whose radius 
of convergence is greater than 1. Then 


g(t) 2 ag t at ant ab sg 
where a, > 0 for all n. Since 
g(1) = ao T aj Te 3 


such an expansion immediately gives rise to a probability distribution with probability 
function 


ak 


—, k=0,1,.... A.36 
g(1) ( ) 


fk) = 
The resulting p.g.f. is immediate since, for any random variable X with this distribution, 


] a) 
dx ii diu usn 


Particular applications of this idea are found in Sections A.11.2 and A.11.3 below. 

To calculate moments given the p.g.f., we could find the m.g.f. by (A.35) and then use 
(A.32), but we can also use (A.33) directly. Notice, for example, that 

PL(1) = E(X), (A.38) 
and similarly 
i 
Py (1) = E[X(X — 1)], 

from which we can derive 


Var(X) = E[X(X — 1)] + E(X) — E(X)* = PXA) + PL) — P(Y. (A.39) 


We close this section by noting that (A.34) and (A.35) easily imply that the analogue of 
Theorem A.1 holds for the p.g.f. That is, if Z is an independent sum of X and Y, then 


P,(t) = Py(t)Py(t). (A.40) 
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A.11 Some standard distributions 


We review here the main features of some standard distributions. 


A.11.1 The binomial distribution 


Suppose we take m independent trials of a random experiment where the outcomes are 
classified as either ‘success’ or ‘failure’, and the probability of a success is p. The number of 
successes is a random variable with probability function 


p(k) = (7) a —py-*, k=0,1,...,m. 


This is known as a binomial random variable with parameters m and p, and we will denote it 
by Bin(m, p). It can be generated, as in (A.36), by the function 


g(t) = (0 — p + pt)” 
(a binomial expansion, which is the source of the name). It follow from (A.37) to (A.39) that 
P Bing (0 =U — p po”,  E[Bin(m, p)] = mp, — Var[Bin(m, p)] = mp(1 — p). (A.41) 


A.11.2 The Poisson distribution 


The Poisson distribution arises as the limit of Binomials where the mean is held constant. 
Precisely, for any A > 0 the Poisson(A) distribution is the limit of Bin(m, A/m) as m goes to 
co. It will have a probability function given by 


no» im ges (A) (0-2) 


o k!(m—k)! Nm m 
AE us AN” AN f mim — D(m — 2) --- (m— k4 1) 
=F in [(0-4)"(0- 4)" ( E J|: 


The first factor in the square brackets approaches e7}, and the other two each approach 1, so 
we are left with 


k 
p(k) = te, k=0,1,2..., 


This is generated as in (A.36) from the function g(t) = e74" so we can deduce from (A.37) 
to (A.39) that 


P poisson) (0 = e" P, E[Poisson(4)] = 4, Var[Poisson(4)] = 4^ -A— 4? =A. (A42) 
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The m.g.f. can be calculated directly from the definition, but it easier to derive it from (A.35). 


[ol 
M Poissonay!) = P Poisson. €) Eme (A.43) 
Here is an example to illustrate the use of m.g.f.’s in deducting distributions. 


Example A.1 If X has a Poisson(A) distribution and Y has a Poisson (y), distribution. Find 
the distribution of an independent sum Z = X + Y. 


Solution. Directly from (A.43) and Theorem A.1 
Mz(t) = MxM) = e^t 
and we conclude that Z has a Poisson(A + ui) distribution. 


This example illustrates a familiar phenomenon. For many common distributions, the 
sum of an independent collection is just another distribution of the same type but with a 
different parameter. In the Poisson case, since the parameter equals the mean, it is clear that 
the parameter for the sum of the random variables must be the sum of the parameters. 


A.11.3 The negative binomial and geometric distributions 


Let us again take repeated trials of an experiment where the probability of a 'success' is p. 
Now, however, instead of taking a fixed number of repetitions, we continue until we get a 
"failure'. We then count the number of successes. This is a random variable with 


pik) -p*u-p, k=0,1,.... 


It is known as the geometric distribution with parameter p (the name is chosen to reflect the 
fact the values of the probability function form a geometric progression) and will be denoted 
by Geom(p). If N is such a random variable, 


P(N 2 m) — (1 - p) Y p =p". 


k=m 
It follows that, for positive integers r and t, 


t+r 
P 


r 


P(N >t+r|N>r)= =p - PN 2 tf). (A.44) 


This says that at any point, the probability that you will continue for a further s repetitions 
is independent of the number of repetitions you have already taken, sometimes referred to as 
a memoryless feature of the distribution. 

Suppose we repeat trials as above, except that instead of continuing until we get a single 
failure we continue until we get r failures. The probability function for the number of successes 
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is now given by 


p - ("TE") pha -py 


or 1) Ee k= Da 


—p), k=0,1,.... (A.45) 


It is clear from the second expression that r can be any positive number and not just an integer. 
We no longer get the interpretation of repeated trials as given but we still have a perfectly valid 
distribution. It is known as the negative binomial distribution with parameters r and p, and 
we denote it by Negbin (r, p). It is generated as in (A.36) from the function g(t) = (1 — pt)’, 
which is the source of the name of the distribution. From (A.37) to (A.39) we have that 


rad rp rp 
P Negbin(r, p = (=) , E[Negbin(r, p)] = Tcp! Var[Negbin(r, p)] — me. 
(A.46) 


The above formulas give the same quantities for the geometric distribution, simply by taking 
r2]. 

There is another parameterization of the negative binomial that is convenient for certain 
purposes. In place of p, we use the quantity a = p/(1 — p). Then 1/(1 — p) = 1 + a. In terms 
of this parameter, we can write 


E| Negbin(r, p)] = ra, Var[Negbin(r, p)] = ra(1 + a). 


A.11.4 The continuous uniform distribution 


Suppose we choose a point at random in an interval (a, b). If we decide that any interval of a 
given length h is equally likely regardless of where the interval starts, then we have a uniform 
distribution U. It has a continuous density function given by 


t 
t) = ——, «t«b, 
fy b= a 


and direct integration shows that 


b-a (b — ay. 


E(U) = g^ Var(U) — 12 (A.47) 


A.11.5 The normal distribution 


One of the most widely used distributions in probability and statistics is the normal. The 
standard normal distribution is that with density function given by 


f= H ee —oo < xX < oo, 


V2 
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whose graph is the familiar bell-shaped curve. This is one instance where we have a random 
variable that takes negative values. For a standard normal random variable Z 


E(Z)=0, — Var(Z) = 1, 


as shown by (A.32) and (A.49) below. The complete family of normal random variables 
consists of those of the form. 


H+oZ 


where pu is any constant o > O and Z has a standard normal distribution. Such a random 
variable will have mean y and variance o°. 

The importance of normal distributions comes from the famous central limit theorem. 
Suppose we have an independent sequence X;, X5,... of random variables, each with the 
same distribution. Then the averages S, = (X4, X5, ... , X,)/n converges in some sense to a 
normal distribution. (We will not give precise details here.) As shown below in Section A.12 
it can be difficult to compute the distribution of a sum of random variables. In many cases 
one invokes the central limit theorem to approximate the distribution of a sum by assuming it 
is normal, and then only the mean and variance need be known. 

Let ® denote the cumulative distribution function of the standard normal. The integration 
to compute this from the density function must be done by some numerical method. It used 
to be customary to publish tables of various values of but now it can be done by various 
computer programs. In Excel®, the formula = NORMSDIST(t) returns the value ®(t). For 
example =NORMSDIST (0.5) returns 0, which is obvious since the distribution is symmetric 
about the mean of 0. For another example, ZNORMSDIST (1.645) returns 0.95, meaning that 
with probability 0.95 a standard normal takes a value less than or equal to 1.645. 

In most applications we are interested in the inverse calculation. Given a we we want to 
find the a-quantile of the distribution, that is the point f so that the probability of being less 
than t is a. This is ©! (æ), which in Excel? is denoted by NORMSINV(a). For example, 
taking a = 0.95, the formula = NORMSINV (0.95) returns 1.645, as we have verified. 

For the case of a general normal X = 4 + o(Z) we have 


X<x ifandonlyif wt+o(Z)<x ifandonlyif Z< (x- n)/0o, 


from which it follows that 


Py) =(“—*), fay = Le (—) (A48) 


Oo 


where we invoke (A.6) for the second equation. It follows moreover that the a-quantile of X 
is given by w+ $7! (a)c. 
To calculate the m.g.f. we start with Z, the standard normal. Then 


M,(t)= 2 efe ™/?dx. 
2 
y2r -œ 
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We handle this by the familiar ‘completing the square’ trick. The exponent in the integrand 
is -ip -2tx-2-?]- -ile — t? — £?]. Since x is the variable of integration here, the 
terms involving f can be factored out, and we write 


Mg(t) =e" /? E | E etna 


2x J-e 


Now the term inside the bracket is just the integral over the entire range of the density 
function of the normal distribution with mean f and standard deviation 1, as we see from 
(A.48) so it must equal 1. We conclude that 


M,(t) = e? (A.49) 


To handle the general case, we will use some formulas involving modifications by con- 
stants. For any constant a, 


My,4(0) = E[e 9] = e'"E[e'*] = e" M, (r). (A.50) 
For any constant b, 
M,x(t) = E[e"*] = My(bt). (A.51) 


The general normal random variable X with mean py and standard deviation o is distributed 
as u + oZ that by combining (A.49)-(A.51), we obtain 


My(t) = eite. (A.52) 


A.11.6 The gamma and exponential distributions 


The gamma distribution is produced from the gamma function, defined by 
I(a) = af x*le*dx, a>0. 
0 


We cannot evaluate this integral in terms of elementary functions. We can, however, get some 
information by integrating by parts. For a > 1, 


Tla) = —x*^ e| + (a — 1) f Nm = (a — DIE(a — 1). (A.53) 
0 


Since I'(1) = 1, it follows by induction that for any positive integer n, 


In) 2 (n — 1)! 
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Choose any f > 0 and consider a more general integral, 


m xt 1 e dy = I(a) 
0 pe’ 


which follows by making a change of variable from x to px. From this, we define a continuous 
positive-valued random variable with the density function 


Oe ler Pe: x>0. 


I(a) 


This is known as a Gamma distribution with parameters a and f and will be denoted by 
Gamma(a, fj). For such a distribution X 


rl rl e-r gy = BY TO -(4). (A.54) 
I(a) I (a) (8 — t)* pot 


Evaluating derivatives at 0 then gives 


My(t) = 


E(X) = al EX?) = ala + 1) 


7 T Var(X) = z (A.55) 


The special case of the gamma when a = 1 arises frequently. It is known as an exponential 
distribution and will be denoted by Exp(f). For such a distribution X we have from above that 


f«(0 — Be P, sy(t) =e, My = op E(X) = " Var(X) — F (A.56) 


This is the unique continuous distribution which has the same “memoryless” property that 
we observed in the discrete case for the geometric distribution, namely, for any nonegative t, 
and r 


P(X » t r|X » r) 2 P(X » t. (A.57) 


A.11.7 The lognormal distribution 


Another commonly used positive-valued distribution is the lognormal. A random variable Y 
is said to have a lognormal distribution if Y = e* where X has a normal distribution. That is, 
just as the name suggests, the logarithm of Y is normal. We can calculate the moments of this 
distribution from (A.52). Namely, if X ~ N(u, 62) (the normal distribution with mean u and 
variance o?), then 


E(Y) = E(eX) = My(1) = +72, (A.58) 
and 


E?) = My(2) = e", (A.59) 
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from which we obtain 
Var(Y) = EY?) — (EY)? = e242)(e" — 1), 


A.11.8 The Pareto distribution 


The probability density function of the Pareto (@,a@) distribution is proportional to (x + 
6)-@+D_ Since 


ive 4 gy ebay = 87 
0 ac 


this density function is given by 


a0" 


f(x) = G+ 8) 


For this distribution, 


eco 5 0 a 
= a0* gy («*D dy = 
s(x) =a if (y+ 0) y (=) 


and, provided a > 1, 


a 


Eoo = [xoa E. 
à ei 


Further integration leads to 


20? 


DY n= 
OY iN): 


provided a > 2. 


In general, E(X*) becomes infinite for k > a. 


A.12 Convolution 
A.12.1 The discrete case 


Convolution is a tool for finding the distribution of a sum of random variables. Consider 
a simple example. Suppose X takes the values 2, 5 and 8 with probabilities 0.5, 0.3 and 
0.2, respectively, while Y takes the values 3, 6 and 9, with probabilities 0.6, 0.3 and 0.1, 
respectively, and that X and Y are independent. Let Z = X + Y. What is the probability that 
Z = 5? This can happen if and only if X = 2 and Y = 3. By the independence assumption, we 
multiply to get the probability of both these occurrences, and we see that 


fz(5) = 0.5 x 0.6 = 0.30. 
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What is the probability that Z = 8? There are two mutually exclusive ways that this can occur, 
namely, X = 2, Y = 6 and X = 5, Y = 3. We deduce that 


fz(8) = fx Qfy (6) + fy (5)fy(3) = 0.5 x 0.3 + 0.3 x 0.6 = 0.33. 
For the probability that Z = 11, we must take three terms: 
zA D) = fxr O) fx Of) + fx(8)fy(3) = 0.5 x 0.1 + 0.3 x 0.3 + 0.2 x 0.6 = 0.26. 


Proceeding similarly, we can calculate f7(14) = 0.09 and f7(17) = 0.02, completing the dis- 
tribution. 

The general rule can be written as follows. Let X and Y be independent random variables, 
with nonnegative integers as values, and let Z = X + Y. Then 


fxn) = Y fof Gn), (A.60) 


where the sum is taken over all pairs (k, m) for which k + m = n. 

Formula (A.60) is fine for small examples like this, but in cases where the random variables 
take on many more values (perhaps even an infinite number) we want to write the formula in 
a more systematic way, which will ensure that we do not miss any combinations. In order for 
a sum of nonnegative integers k + m to sum to a nonnegative integer n, we can have k take 
any value from 0 to n and then m must take the value n — k. We can then write (A.60) as 


F= 3 KO- 9. (A.61) 
k=0 


Equation (A.61) is the general so-called convolution formula. Note, however, that for small 
problems like the one above, which we want to do by hand calculation, it can be inefficient 
compared to (A.60) since we will be adding up many terms of 0. 

Suppose we want the probability that Z < n. We can reason in the same way. We need X 
to take a value k from 0 to n and now Y must take a value less than or equal to n — k, so we 
have 


Fz(n) = Y fF y(n - k). (A.62) 


k=0 


It is not hard to verify that we can interchange F and f also write (A.28) as 


Fz(n) = Y Ffyn — K. (A.63) 


k=0 


This follows simply by changing the variable of summation from k to n — k. The reader is 
warned that we cannot write Fz(n) = Y 9Fy(k)Fy(n — k). 

As an illustrate of convolution, we will redo Example A.1, which we did before by p.g.f.’s, 
now using convolution. 
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From (A.61), 
fam = Ye ow JUE o e UP + uy Y ATE M E Em 
Z Par k!(n — k)! n! A \ k) Natu +u l 


The summation on the right hand side is just a binomial expansion 


CREDE 


and we conclude that Z has a Poisson(/ + ui) distribution. 

Suppose we want the sum of more than two discrete random variables. For example, given 
independent random variables X, Y, W, find the distribution of V = X + Y + W. For cases 
where there are few nonzero values, we can proceed just as in (A.60) and write 


K= Y, foy (fy). 


where the sum is taken over all ordered triples (k,m,p) such that k + m + p = n. For the 
general case, however, it may not be so easy to pick out all such triples, and we have to 
proceed more systematically. The basic procedure is to iterate the calculation. We first apply 
(A.60) or (A.61) to find the distribution of Z = X + Y and then apply it again to find the 
distribution of V = Z + W. For the general case of a sum of n independent random variables, 
we just iterate this n — 1 times. 


Example A.2 Let Z= $us X;, where the X;'s are independent random variables, each 
taking the value 0 with probability 0.7, 1 with probability 0.2 and 2 with probability 0.1. Find 
the probability that Z = 2. 


Solution. We can derive this single number by inspection, without having to do the four 
iterations. One way for the five values to add up to 2 is that one of them is 2 and the rest are 
zero. Since there are five possibilities for the 2, the probability of this is 5 x 0.7* x 0.1. The 
only other way is for three of the values to be zero and the other two to be 1. The probability 
of this 10 x 0.73 x 0.22. So 


P(Z = 2) = 5 x 0.74 x 0.1 + 10x 0.7? x 0.22 = 0.2573. 


A.12.2 The continuous case 


We now suppose that X and Y are independent, continuous, nonnegative random variables, 
and we want the distribution of Z = X + Y. If X takes the value x and Y takes the value y, the 
sum will be less than or equal to s if and only if the point (x, y) lies in the region bounded by 
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the x and y axes, and the line y = s — x. Integrating the joint density function over this region 
gives 


F,(s) = / J fc yl y)dydx. (A.64) 


Equation (A.64) is in fact true without the independence assumption, but with the assumption 
of independence the integral simplifies considerably. The joint density function factors as 
fx @ofy(y), the inner integral becomes 


li — fo)dy = Fys - 3) 


and we can write 
F7(s) = A fx(x)EFy(s — x)dx, (A.65) 


a direct analogue of (A.62). 
Using Leibniz's rule for differentiating integrals and the fact that Fy(0) = 0, we have 


fils) = f Goff Gs — ode, (A.66) 


a direct analogue of (A.61). 
The following example serves to indicate that calculating convolutions for continuous 
distributions can be a very involved procedure, even in the simplest cases. 


Example A.3 Find the distribution of an independent sum Z = X + Y, where X has a 
uniform distribution on the interval [0, 2] and Y has a uniform distribution on the interval 
[0, 3]. 


Solution. Whether we use (A.65) or (A.66) depends on the particular example. In this case, 
it is easier to use (A.66). See Figure A.1, and notice that it depicts a region of the (x, s) plane 
(rather than the (x, y) place. It is that portion of g the positive quadrant in the (x, 5) plane 
given by 0 < s < 5,0 < x € s. A distribution that is uniform on an interval [a, b] has a density 
function that is a constant (b — a)~! on this interval and zero elsewhere. Consequently, the 
integrand in (A.66) takes the value 1/6 when 0 € x € 2and 0 € s — x € 3, as indicated by the 
union of regions R}, R5, R5, and it take the value of 0 elsewhere, as indicated by the union 
of regions S4, S2, $5. Now we consider in turn all possible values of s and the value of the 
integral in (A.66). 
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i =x 


| | 
t t 
£22. x«-—3 x=5 


Figure A.1 The region of integration in Example A.3 


If0 < s € 2, the value of x in R, will vary from 0 to s, so that the integrand will be 1/6 for 
all x. Therefore, 


fyls) = e 0<s<2. 


If 2 < s < 3, the value of x in R, will vary from 0 to 2, so that the integrand will be 1/6 
when x < 2 and zero when 2 < x < s. Therefore, 


ZOREY 2<s<3. 


If 3 < s <5, then the value of X in R} will vary from s — 3 to 2, so that the integrand will 
be 1/6 when s — 3 € x € 2, but zero when 0 € x < s — 3 or when 2 < x. Therefore, 
5-5 


fz(s) = z , 


3<s<5. 
Of course, since Z varies from 0 to 5, we know that 


fz(s)20, s«0ors» 5. 


A.12.3 Notation and remarks 


Suppose that X has probability (or density) function f, and distribution function F, while Y 
has probability (or density) function g and distribution function G. 

The probability (or density) function given in (A.61) (or (A.66)) is denoted by f * g. 
The distribution function given in (A.62) (or (A.65)) is denoted by F « G. One must take 
care with this latter notation as it could induce the error that the reader was warned about 
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above, of using the c.d.f. in both places when calculating the c.d.f. of the sum. Similarly, 
given a sum of n independent random variables with the same distribution, we use f*” for the 
probability (density) function of the sum and F*” for the distribution function of the sum. To 
illustrate, in Example A.2 we would denoted the desired answer as f ?(2), where f was the 
given probability function. 

As a final remark in this section, we note that all of the above formulas hold for random 
variables that take negative values, with the exception that we must replace the lower limit on 
the integrals and sums by —oo. 


A.13 Mixtures 


Suppose you are faced with a choice of two games to play. Game 1 has a return of either 
2 or —2 each with probability 1/2, while game 2 has a return of 4 with probability 1/4, 2 
with probability 1/4 or —3 with probability 1/2. Racked with indecision, you flip a coin, 
intending to pay game 1 if a head turns up, or game 2 if a tails comes up. What you are really 
playing is a mixture of game 1 and game 2 with equal weights. The resulting return is easily 
calculated to be 4 with probability 1/8, 2 with probability 3/8, —2 with probability 2/8 and —3 
with probability 2/8. More generally, we could have n random variables X,, X2, ... , X, and a 
probability distribution on (1,2, ..., 1), called the mixing distribution. The resulting mixture 
is arandom variable X that takes a value from X; with probability p(i), where p is the probability 
function of the mixing distribution. (Our initial example had p(1) = p(2) = 1/2.) Calculating 
quantities with respect to the mixed distribution presents no problems as everything follows 
the same convex combination. We have, for example, 


n 


fe) = Pi Pf), Fx) = MypQOFy, S), — MyG) = PI pUOMy, (s). (A.67) 
k=1 


k=1 k=1 


n 


Let us consider a slightly trickier situation, in which we have continuous mixing. Suppose 
now that we have a whole family of random variables X,, either all discrete or all continuous, 
indexed on all the nonnegative reals [0, co) instead of just the integers. For a mixing distribu- 
tion, we take a continuous nonnegative random variable with density function p. The resulting 
mixture is a random variable X with density function 


f(s) = J Pf, G)dr. (A.68) 
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Notation index 


The following is a list of the major symbols which are used in the book. For the most part, 
the first page they appear on is listed. An exception is the notation in Appendix A, where the 
first appearance in that chapter is noted. This list excludes that part of the standard actuarial 
notation which is not used in the main body of the text. The latter can be found in the 
appropriate sections of Chapters 2—6, 8 and 10 entitled ‘Standard Notation and Terminology’. 


Chapter 2 


a(e;v) 15 
axb 
B,(c;v) 21 
B,(c) 25 
ye 21 

ke 21 
cok 25 
d, 12 

e 17 

i, 12 
jV(e;v) 21 
wk) 11 
v(k,n) 10 
vok 25 
Val,(e;v) 26 
Ab 17 
Vb 18 


Chapter 3 
d, 39 


X 


e. 42 


x 


e 42 
jp, 40 
1d, 40 
p, 40 
q, 40 
C, 39 


ao 40 
Chapter 4 


ü,(c) 49 
(l4) 51 
yk) 50 
Gyn) 52 


Chapter 5 


A,(b) 62 
w,(k) 69 
Aix 70 
Chapter 6 


Nk 81 
V 78 
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Chapter 7 
a™e;y) 99 


a™(e;y) 104 


a™(e) 101 
d 100 
i" 100 
a(m) 102 
Bim) 102 


Chapter 8 


ac;y) 113 
a,(c) 115 
A,(b) 119 
a(oo) 116 
f(co) 116 
ô (t) 113 
ô 114 
A,(t) 119 
u(x) 118 
H(t) 118 
Chapter 9 
Cige 139 
sP[x]- 138 
sd [x]- 137 
quju 138 


Chapter 10 


üj(c) 145 
ü,(c) 145 
üz(c) 147 
A (€) 148 
Ax (c) 

Asc) 

Ág(c) 148 
Al (b) 152 
AZ (b) 152 
Al (b) 153 
AZ (b) 153 
nPxy 143 
ndxy 146 
niyy 153 
dh 147 
Myy(t) 147 


`~ 


`~ 


Chapter 11 


uPA 165 
Chapter 12 


AS, 187 
Pr, 190 
II, 194 


Chapter 13 


AV, 201 
COL, 202 
S, 205 


Chapter 14 


fÐ 220 

s (t) 220 
Tou 216 
T 217 
Chapter 15 


az(e;v) 229 
ār(c,v) 231 
A7z(b,v) 225 
Áq(b;v) 226 
CV(X) 238 
„L 234 

L 235 


Chapter 16 


Py 252 
f 252 


Chapter 17 


ÍT Ta.. ny fos 


FT T... Ty ts o 


..tQ,) 260 
s, 260 


ST, Tz, Ty CL fa + 


Fr jj) 262 
frst j) 264 

sp g(tj) 264 
Ur y(tj) 266 


Chapter 18 


Pry(k,n) 284 
p, 284 
o(h) 294 

~ 284 


Chapter 19 


pus.) 314 
mÐ 312 


Chapter 20 


E,(W) 350 
EQ(W) 366 


Chapter 21 


(N,X) 379 
S+ 384 
X-d), 388 
X^d 388 


Chapter 22 


VaR 413 
TVaR, 413 


tQ) 260 
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Chapter 23 


D(u) 423 
J(u) 433 

£L 440 
y(u) 420 
v,Q) 436 
w(u,t) 420 


Chapter 24 


E(X|Y) 453 
Var(X|Y) 455 
« 459 


Appendix A 


P(A) 477 
P(A|B) 479 
fea) 480 
Fy(x) 480 
E(X) 481 
Var(X) 481 
s(t) 482 
Cov(X,Y) 484 
E(X|B) 486 
fxg 499 
Fx*G 499 
f" 499 

F*" 499 
My(t) 486 
Py(t) 487 
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Index 


accumulated value 14, 34 

actuarial equivalence 13-15 

actuarial present value 225 

adjustment coefficient 429-31, 440, 445 
aggregate mortality 139,219 

american option 338, 346-8 
amortization 25 


annuity 
cash refund 69 
certain 58 


continuous 112-13, 232-3 
deferred 33,49, 55-6 
due 33, 104 
guaranteed 54 
immediate 33, 104 
instalment refund 60, 69 
joint-life 146 
last survivor 147-8 
life 47-8, 150-1, 229-31 
(m)thly 98, 101-4 
reversionary 152, 158 
temporary 49, 160, 248 
variable 203-4 
whole life 49,51, 57 
arbitrage 31, 334 
arbitrage-free market 334-7, 342, 343, 
351, 367-8 
Arrow’s optimal insurance theorem 411 
asset share 188, 191 


associated single decrement table 175-81 


Balducci hypothesis 109 

binomial tree 343, 348 
Black-Scholes-Merton formula 361-4 
Brownian motion 295-9, 362 


call option 338, 344, 346, 362 
cash surrender value 88 
cash flow vector 7, 13, 78 
central limit theorem 492 
central rate of mortality 134 
Chapman-Kolmogorov equations 313 
Chebyshev's inequality 432, 482 
collective risk model 377 
comonotonic 415 
complete market 359-61 
common shock model 271-3, 321-2 
compound distribution 377-98 
compound Poisson 

distribution 379 

process 438-40 
concave function 404, 406-7 
conditional expectation 366, 453-5 
conditional probability 450, 479 
conditional variance 453-4 
conditional tail expectation (CTE) 414 
conjugate prior 467 
constant force assumption 132 
constant force of mortality 126-7, 215 
contingency loading 239 
contingent insurances 153-5 
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convex 
function 405 
order 417 
set 353 
convolution 381, 394, 495—500 
cost of insurance 200, 201 
copula 273-6 
counting process 293 
covariance 484 
credibility 
Bayesian 457-9 
Bühlman 463-4 
Bühlman - Straub 464-5 
exact 465-8 
cumulative distribution function 480 


deductible 388-92 
deferred 
annuity 33, 49, 55-6 
contract 233 
insurance 73, 255 
deficit at ruin 422, 432 
defined benefit plan (DB) 57, 204 
defined contribution plan (DC) 57, 
206-7 
density function 480 
Demoivre’s law 127 
difference formula 24 
differential equation 124-5, 313, 324 
discount 
force of 113 
function 9-11 
rate 12 
distortion risk measure 417 
distribution 
beta 474 
binomial 382, 489 
exponential 215, 494 
gamma 383, 398, 493-4 
geometric 291, 490 
Gompertz 215 
lognormal 494 
Makeham 216 
negative binomial 382-3, 398, 
490-1 
normal 238, 296, 491 
Pareto 384, 495 


Poisson 294, 382, 489 

uniform 215, 491 

Weibull 223 
distribution function 480 
dividends 85 


endowment 

identity 71 

insurance 63-4, 248 

pure 48 
Euler’s method 125, 317 
European option 338 
expectation 481 
expenses 88, 184-6 
expense-augmented premium 185 
expense-augmented reserve 186 
expiration date 338 


Fackler reserve formula 83 
failure time 211 
first-death insurance 153 
force 

of decrement 170 

of discount 113 

of failure 149 

of interest 114 

of mortality 118 

of transition 312 
forward 

contract 30-1 

interest rates 32,370 

prices 30-2, 370 
frequency 378, 381-3 
full preliminary term 187 
fundamental theorem of asset pricing 352, 

357-8 


gains and losses 83-5, 191-3 
gambler’s ruin 427 

generational annuity table 142 
Geometric Brownian motion 299, 362 
Gompertz’s law 222 

gross premium — 56, 185 

gross premium reserve 188 


Hattendorf’s theorem 241 
hazard rate 212 


increment 

independent 293 

stationary 293 
independence 479, 485 
individual risk model 378 
insurance 

casualty 377 

deferred 73, 254 

endowment 63-4, 248 

life 61-74, 225-9 

term 64, 278 

universal life 199—202 

whole life 64 
intensity function 192 
intensity matrix 313 
interest 

constant 12-13 

force of 114 

nominal rate 100 

rate 12,372 
interest and survivorship 50-2 
internal rate of return 29 


Jensen’s inequality 407-8 

joint density function 483 

joint distribution 483-4 

joint distribution function 483-4 
joint-life status 144-6, 171 
joint survival function 260 


Kolmogorov equations 312, 317, 318, 
330 


lapse 88 
last survivor status 147 
life annuity 47-8, 150-1, 229-31 
life expectancy 42-3, 117-18, 221 
complete 42 
curtate 42 
temporary 43, 117 
life insurance 61-74, 225-9 
life table 39-45 
loss elimination ratio 401 
Lundberg’s inequality 445 


Makeham’s law 222 
Markov chain 282-4 


martingale 
maximal aggregate loss 
mean 469, 481 

minimum random variable 259 
mixtures 500 
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finite state 287-92 
limiting distribution 289 


models for insurance and annuities 


304 
non-stationary 305 
periodic 289 
reducible 289 


mode 215 


modified reserve system 187 


moments 481 


286-7, 352, 424 
441-3 
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moment generating function 486-7, 488 


Monte Hall problem 451-2 


mortality 
forceof 113 
rate 40 
table 39 
multi-state models 304-31 
multiple decrement 
models 166-83 
table 167 
multiplication rule 41 


net amount at risk 83 

net annual premium 53, 64 
net single premium 51 
nonforfeiture 88 


nonhomogeneous Poisson process 


nonidentifiability 268 


options 337-9 
American 338, 346-8 
call 338 
European 338 
embedded 333 
lookback 343 
put 338 


optional stopping theorem 426-7 


paid up reserve formula formula 91 


pension plans 57, 204-7 
periodic Markov chain 289 
Poisson process 293-5 
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premium 4, 47, 61, 122, 235-40 
annual 53, 64, 255 
expense-augmented 185 
equivalence principle 23 
gross 56, 185 
net 56 
pattern vector 55 
percentile 236-7, 413 

posterior distribution 458 

premium difference reserve formula 

90 

premium principle 240-1 

present value 13,51 

prior distribution 458 

probability density function 480 

probability function 480 

probability generating function 487-8 

probability mass function 480 

probability measure 477-8 

profit margin 195 

profit signature 195 

profit testing 193-5 

prospective loss 234, 249 

prospective method 23 

pure endowment 48, 65 

put-call parity 342 


random variable 479 
continuous 480 
discrete 480 
random walk 284, 293 
recurrent state 290-3 
recursion formulas 24 
aggregate claims 199 
balances 25 
life expectancy 43 
reserves 24, 82, 188, 326 
ruin probabilities 434-8 
reserve 76—96, 187-9, 324-7 
definition of 21 
differential equation 124-5, 327 
expense-augmented 186 
at fractional durations 107 
gross premium 188 
initial 83 
modified 187 
net premium 186 


prospective 23 
retrospective 23 
terminal 83 
Zilmerized 186 
retrospective method 21 
risk averse 404, 408 
risk free 
bond 31 
rate of interest 334 
risk comparison | 408-12 
risk loading 239 
risk measures 412-17 
risk-neutral 340 
risk portion of premium 86 
Rothschild-Stiglitz 408 
run 420-48 
functional equation approach 422 
martingale approach 424 
recursion formula 43-8 
time of 420 


salary scales 205 

sample space 477 

savings portion of premium 86 
second-death insurance 148 
select and ultimate 139 

select mortality 137-40 

select period 138 


self-financing trading strategy 335, 337, 


364 
semi-Markov process 328 
severity 378, 383 
short selling 31 
sojourn probabilities 314-15 
spot rate of interest 32 
standard deviation 240, 482 
stationary increment 293 
stochastic process 281-303 
discrete-time 281 
continuous-time 293 
realization 282 
stop-loss reinsurance 392 
stopping time 424-6 
strike price 338 
St. Petersburg paradox 404 
submartingale 286 
supermartingale 287 


surplus process 
compound Poisson 438-40 
discrete 421 

survival 
distribution 211-16 
function 212,214 


tail value at risk (TVaR) 413- 


term structure of interest rates 
Thiele’s differential equation 
327 

time value of money 8 
total probability, law of 486 
total expectation, law of 486 
trading strategy 334 
transient state 290-3 
transition 

matrix 287 

probability 287 


unearned premium 108 
uniform distribution of deaths 
(UDD) 101-2 


17 
32 
125, 317, 


uniform seniority 223 
universal life 199-203 


utility 403-6 
exponential 417 
function 404 
power 405 


valuation 78 
premium 85 
value attimen 14 

value atrisk 413 


variable annuity 203-4 


variance 224, 481 
volatility 362 


waiting times 295 


Woolhouse’s formulas 
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yield 29 
yield curve 32 
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